Manas Macherla
Manas Macherla

Reputation: 21

Extracting the scale of translation vector that I got from the essential matrix

I want to get the extrinsic parameters of two cameras looking at the same view. For this I followed the procedure laid out in several textbooks, lectures, etc.

  1. Compute matches in both images using SIFT.
  2. Computed the essential matrix using OpenCV cv2.findEssentialMat.
  3. Recovered the correct R and t from the four solutions using cv2.recoverPose().

From my understanding the translation is up to a scale. What do I have to do to get the absolute translations. I do not have any known objects in the scene, maybe I will have lane lines in the scene, is there a way to use the lane line info to get the absolute translation?

Upvotes: 2

Views: 1623

Answers (1)

Max Crous
Max Crous

Reputation: 421

I found this post on dsp stackexchange that partly addresses your problem. As you have found, the scale of the translation cannot be inferred from the essential matrix, you need more information. This makes sense, as there is an ambiguity of size and shape if your only information is point correspondences.

How to infer scale
If you need to know the camera translation scale, you will need to know some scene geometry. that is, something you can use as a reference to determine the extent of the translation, e.g. coordinates of a calibration object in the scene. You could then use a pose estimation method like Perspective-n-Point (PnP). I found this lecture by Willem Hof on PnP which includes code screenshots quite clear and concise.

Note that when performing PnP that you have multiple unknown. Your first camera was assumed to be [I|0] so its pose is totally unknown. Once the first camera is known, the second camera's pose will be P1· rel P1, and you only have one unknown parameter left for the second camera, the scale of its translation.

Why you cannot infer scale of translation
For example, if you have two images of a ball and many point correspondences, taken with calibrated cameras with unknown positions and poses: then is it a normal football or a mountain-sized ball sculpture? Well, we could use the essential matrix to get relative poses of the two cameras and triangulate a 3D reconstruction of the ball. But would we know the scale? Sure we know the shape of the ball now, but what is the distance between the triangulated points? That information is not present. You can infer the camera's relative rotation; one is in front of the ball (denote this camera as [I | 0] ) and the other is on the side of the ball. You also know in which direction the camera traveled (translation), but not how far. For a large object, the translation would be of a larger scale. Again, you do know the relative translation direction and the relative rotations of two cameras from essential matrix decomposition, which is a valuable constraint.

Upvotes: 2

Related Questions