Reputation: 3845

Reconstructing 3D locations of corresponding points

I'm working on a project where I would like to reconstruct the 3D locations of feature points I've extracted from my camera images. The idea is to:

Make a camera recording (Greyscale information, VGA size: 640 x 480)
Extract feature points in the camera frames (I'm using SIFT for this)
Correspond features from frame[k-1] with features from frame[k] (I intend to use RANSAC for this, more on that later...)
Calculate/estimate some relative distance information between these feature points (this would be in some (x,y,z) coordinate system)

I've read in many papers that RANSAC is an algorithm that is used in reconstruction, with the end result being some kind of point cloud. I want to be able to do just that. However, I've ran into a few snags, and I hope you guys can help me out with these.

The first snag is that I do not really understand how I would be able to use RANSAC to perform this point correspondence. I understand the concept of RANSAC being a model-fitting tool, I just don't see how it could be used for doing correspondence solving.

The second snag is, assuming I have my correspondence information, how to get some kind of distance information between all these points. I've read that perspective projection could be used to solve this, and in turn one should try to estimate the Fundamental Matrix. Then do some math magic to be able to get the point cloud. Point is, I don't understand what the actual values in a Fundamental Matrix mean. I know it's gives a mathematical relation between the position of 2 cameras (or in my case, 2 frames in a video there the camera is moving), and that it exploits epipolar geometry. But besides this, I just don't have a clue what the Fundamental Matrix actually entails. How is this 3x3 matrix capturing the 6DOF of 1 camera with respect to another? Also I think the 'math magic' I referred to are some kind of matrix multiplications, but I haven't found any informational source to explain me what it does and what the formulation is.

Therefore, my question is: Could anyone of you point me into the right direction? I've been digging through the references of the papers I've read so far, but these also give me the "we solve this using the RANSAC algorithm"-line and I'm getting more the feeling I'm looking in the wrong direction. Is there some nice explanation of these things, perhaps in laymen's terms and/or with some illustrations? In short: where should I be looking or where can I find this elusive piece of information?

Thanks in advance, Xilconic

PS: Checked wikipedia, but it's not helping me much. Also listened to the 'Fundamental Matrix Song', and it's the same story.

Upvotes: 2

Answers (3)

peakxu

Reputation: 6675

A robust solution to a fundamental matrix using something like a 5 point or 8 point algorithm is certain a good start. That said, fundamental matrix solution can be susceptible to outliers and you'll likely want some additional overarching system to do the actual 3D solving. You can use a Kalman Filter type approach (fast, can be done in real time on embedded systems) or bundle adjustment (very accurate but can be slower).

Some good SFM software out there that you can use or draw inspiration from:

VSLAM (developed by Konolige who is a professor at Stanford and also works at Willow Garage, the OpenCV folks). Probably the fastest bundle adjustment solution I've seen.

RSLAM (developed by the Oxford Mobile Robotics Group, showing some excellent results)

Upvotes: 1

vidstige

Reputation: 13085

Wrote my thesis on this, also using the RANSAC algorithm in my paper.

There is more to this topic than can be captured in a few paragraphs here. Consider getting the excellent book Multiple View Geometry.

Snag 1

RANSAC will find a model, in this case the fundamental matrix F, even in the presence of huge amount of outliers. In this case, some point-correspondance candidates are way off. This is an outlier. Basically you just keep fitting the F matrix from randomly drawn points. Eventually you find some set of points that together creates a consistent model. These are the inliers. They can now be used to estimate the model (F) more accurately.

There is an easy example in my paper with a line-fitting example to get you started and a easy-to-grasp explanation of RANSAC applied to the correspondance problem.

Snag 2

The most important thing about the F matrix is that it maps a point in one image to a line in the other:

Fx = l' where x is a point in one image and l' is a line in the other.

The F matrix has 9 elements but must have rank 2 and also the scale does not matter, thus it has only 7 degrees of freedom. There is no easy explanation for the elements of the F matrix.

Using a point correspondance x <-> x' and F the world 3d coordinate, X, of the depicted point can be extracted if you know the cameras internal parameters, like focal length.

Note that when using consecutive movie frames the camera usually moves very little and it might be hard to compute the fundamental matrix. It can be worked around though. I suggest looking into the works of Marc Pollefeys'

Upvotes: 4

jilles de wit

Reputation: 7148

Look at the first formula in the wikipedia entry on the fundamental matrix:

enter image description here

This is the "model" you are trying to solve using RANSAC. You have two 3xn (n>=7) matrices x and x' that represent all your corresponding x,y - x',y' points in both images (the 3rd coordinate is just the number 1 all the time). And an unknown 3x3 matrix F for which you want to find out the values. The pseudocode algorithm for RANSAC in the wikipedia entry is a pretty good explanation.

Now, what is the fundamental matrix? One way to think of a point in an image is as a 3D line connecting the camera position and that point in 3D space. This line extends to infinity in both directions. If you look at a 3D point on that line with a different camera then in the image from that camera you see a line going right across it. The transformation (projection really) of a point in an image to a 3D line is just a matrix operation. The projection of a line in 3D onto a 2D image is also a matrix operation. F captures both these matrix operations in one matrix. F can also be used to determine the camera matrix of both camera's, which can then be used for the 3D reconstruction.

Maybe this helps a bit? Otherwise, I've learned most I know about this from Hartley and Zisserman.

Upvotes: 2

Reconstructing 3D locations of corresponding points

Answers (3)

Snag 1

Snag 2

Related Questions