Chris Venter (MSc), Pieter Rautenbach (MSc), Jan-Hendrik de Vaal (MSc), Francois Malan (MSc), Barry-Michael van Wyk (MSc)
Suppose a rigid object moves relative to a video camera in such a way that different views of the object is recorded in the video sequence, see the video. Because different views are recorded, there is 3D information implicit in the video sequence. A typical task of Structure-from-Motion is to extract the 3D information from the video sequence.
The video shows a mannequin that is moved by hand on flat surface. This is the only information available to us, and using this information we want to do a 3D reconstruction of the mannequin.
The first ingredient of an SfM system is a tracker that follows features in the video sequence. In this case the features are the dots painted on the mannequin (we could not find a student willing to have his/her face painted). The actual reconstruction consists of the 3D coordinates of the features, as well as a reconstruction of the motion (rotation and translation). It is important to note that any reconstruction is only up to an unknown scale factor.
There are different ways to do the reconstruction. In this demonstration the fact that there is little change in view from one video frame to the next is exploited.
Starting from an initial estimate of the 3D reconstruction, each video frame is viewed as a new observation that can be used to update the current estimate. Since we do not have any depth information, our initial estimate is a flat face (z coordinates equal to zero). A nonlinear extension of the Kalman filter (we generally prefer the unscented Kalman filter) is used to update the current best estimate based on the next video frame. In the video (prepared by Pieter Rautenbach) one can see how the the estimates improve with more measurements (video frames) until things settle down after about 100 frames.
The Kalman filter also requires a dynamic component describing the motion between two observations. Assuming that there is little change from one frame to the next, the identity map is specified with all the uncertainty absorbed by the error term.
Francois Malan investigated the use of SfM for the docking of two satellites. Using markers in known positions on one satellite SfM is used to calculate the relative orientation and velocity of the two satellites. Since the markers are in known position, providing the scale, the reconstruction provides absolute values.
SfM is widely used for UAV’s
The problem described above is relatively straightforward because one can assume that there is little relative motion from one frame to the next, and the camera parameters remain the same. A more difficult situation occurs when one cannot assume any of that.
Imagine that you have collected a number of photographs of a historic site by different tourists, using different unknown cameras, from different unknown viewpoints, before the site was destroyed (two of which are shown). You are interested in a 3D reconstruction of the site based on the available photographs. Note that you have absolutely no information about the cameras, you cannot even assume that the same camera was used, nor do you know anything about the viewpoints. In this case you cannot assume that there is little change from one photograph (the frames of the previous case) to the next. Clearly the Kalman Filter approach is not applicable. Again the first step is to find corresponding features in the different photographs. These are then used to estimate the camera calibration parameters of the different cameras after which a 3D reconstruction is possible.
The two views are from the reconstruction by the system developed by Jan-Hendrik de Vaal. Note that the lack of detail in the face is due to the scarcity of trackable features.