YAHOOOOO
YAHOOOOO

Reputation: 957

Get 3D coordinates from 2D image pixel if extrinsic and intrinsic parameters are known

I am doing camera calibration from tsai algo. I got intrinsics and extrinsics matrices, but how can I reconstruct the 3D coordinates from that information?

enter image description here

I have now 2 ways to find X,Y,Z:

  1. I can use Gaussian Elimination for find X,Y,Z,W and then points will be X/W , Y/W , Z/W as homogeneous system.

  2. I can use the OpenCV documentation approach:

    enter image description here

    As I know u, v, R , t , I can compute X,Y,Z.

However both methods end up in different results that are not correct.

What am I'm doing wrong?

Upvotes: 41

Views: 40599

Answers (2)

Jav_Rock
Jav_Rock

Reputation: 22245

If you got extrinsic parameters then you got everything. That means that you can have Homography from the extrinsics (also called CameraPose). Pose is a 3x4 matrix, homography is a 3x3 matrix, H defined as

H = K*[r1, r2, t],       //eqn 8.1, Hartley and Zisserman

with K being the camera intrinsic matrix, r1 and r2 being the first two columns of the rotation matrix, R; t is the translation vector.

Then normalize dividing everything by t3.

What happens to column r3, don't we use it? No, because it is redundant as it is the cross-product of the 2 first columns of pose.

Now that you have homography, project the points. Your 2d points are x,y. Add them a z=1, so they are now 3d. Project them as follows:

p          = [x y 1];
projection = H * p;                   //project
projnorm   = projection / p(z);      //normalize

Upvotes: 37

DerekG
DerekG

Reputation: 3958

As nicely stated in the comments above, projecting 2D image coordinates into 3D "camera space" inherently requires making up the z coordinates, as this information is totally lost in the image. One solution is to assign a dummy value (z = 1) to each of the 2D image space points before projection as answered by Jav_Rock.

p          = [x y 1];
projection = H * p;                   //project
projnorm   = projection / p(z);      //normalize

One interesting alternative to this dummy solution is to train a model to predict the depth of each point prior to reprojection into 3D camera-space. I tried this method and had a high degree of success using a Pytorch CNN trained on 3D bounding boxes from the KITTI dataset. Would be happy to provide code but it'd be a bit lengthy for posting here.

Upvotes: 3

Related Questions