user75
user75

Reputation: 133

Up to scale in 3D triangulation using epipolar geometry

I'm currently working on a project in which I have to estimate 3D coordinates of 2D interest points detected using a monocular camera.

To be more precise, I have in input an image sequence (calibrated) and it is required, when receiving a new image, to triangulate points between the left (previous) image and the right current one to get 3D points.

To do this, I'm following these steps:

  1. Extracting key-points in the current image
  2. Establishing correspondences between the current and the previous image
  3. Computing the Essential Matrix E using RANSAC and the height-point algorithm
  4. Extracting the transformation matrix R and the translation vector T from E
  5. Computing the 3D points using triangulation via orthogonal regression

The resulting 3D points are not correct when I reproject them on the images. But, I have read that the triangulated points are defined to only up to an indeterminant scale factor.

So my question is: What does "up to scale" means in this context? And what is the solution to get the real 3D points in the scene's world coordinate frame?

I would be thankful for any help!

Upvotes: 6

Views: 3387

Answers (2)

Francesco Callari
Francesco Callari

Reputation: 11785

You are likely to have a bug or a poorly estimated essential matrix. The unknown scale factor cannot be responsible for the reconstruction errors you see. Regardless of global scale, the result of projecting onto the image pair a 3d point estimated from good matches and with a valid essential matrix should be consistent.

The meaning of "up to scale" in this context is that, even with intrinsically calibrated cameras, the standard method of estimating the essential matrix yields the same result if you replace your scene with one in which everything is larger or smaller by the same amount. You can remove this ambiguity either

  • Before the fact, by calibrating the camera extrinsic parameters, i.e., the location and orientation of one camera with respect to the the other) using a method that fixes the scale. For example, using a calibration object of known shape and size.
  • After the fact, at stereo reconstruction time. For example, by identifying in the scene an object of known physical size, and imposing that your computed 3d reconstruction matches that size.

Upvotes: 3

PeterNL
PeterNL

Reputation: 670

My experience working with Structure from Motion that extracting a Point Cloud by using the Fundamental Matrx/Essential Matrix, leads to an different Point Cloud size. How I understand is it because the fundamental Matrx/essential Matrix is valid if

x1^t * F * x2 = 0.

This equation is also valid if the scale of F changes. So we don't have scale invariance.

What worked really well is to extract the new camera positions from the exisiting point cloud which was computed from old image pairs before. For that you have remember the 2D 3D correspondenc of images before. It is called Perspective­n­Point Camera Pose Estimation (PnP). OpenCV has some Methods for that.

Here is some conclusion for structure from motion:

Structure from Motion, Reconstruct the 3D Point Cloud given 2D Image points correspondence

Upvotes: 0

Related Questions