user1589759
user1589759

Reputation: 182

Inverse Perspective Transform?

I am trying to find the bird's eye image from a given image. I also have the rotations and translations (also intrinsic matrix) required to convert it into the bird's eye plane. My aim is to find an inverse homography matrix(3x3).

rotation_x = np.asarray([[1,0,0,0],
                        [0,np.cos(R_x),-np.sin(R_x),0],
                        [0,np.sin(R_x),np.cos(R_x),0],
                        [0,0,0,1]],np.float32)

translation = np.asarray([[1, 0, 0, 0],
                         [0, 1, 0, 0 ],
                         [0, 0, 1, -t_y/(dp_y * np.sin(R_x))],
                         [0, 0, 0, 1]],np.float32)

intrinsic = np.asarray([[s_x * f / (dp_x  ),0, 0, 0],
                       [0, 1 * f / (dp_y ) ,0, 0 ],
                       [0,0,1,0]],np.float32)

#The Projection matrix to convert the image coordinates to 3-D domain from (x,y,1) to (x,y,0,1); Not sure if this is the right approach
projection = np.asarray([[1, 0, 0],
                        [0, 1, 0],
                        [0, 0, 0],
                        [0, 0, 1]], np.float32)

homography_matrix =  intrinsic @  translation @ rotation  @ projection

inv = cv2.warpPerspective(source_image, homography_matrix,(w,h),flags = cv2.INTER_CUBIC  | cv2.WARP_INVERSE_MAP)

My question is, Is this the right approach, as I can manual set a suitable ty,rx, but not for the one (ty,rx) which is provided.

Upvotes: 4

Views: 6743

Answers (1)

Francesco Callari
Francesco Callari

Reputation: 11785

First premise: your bird's eye view will be correct only for one specific plane in the image, since a homography can only map planes (including the plane at infinity, corresponding to a pure camera rotation).

Second premise: if you can identify a quadrangle in the first image that is the projection of a rectangle in the world, you can directly compute the homography that maps the quad into the rectangle (i.e. the "birds's eye view" of the quad), and warp the image with it, setting the scale so the image warps to a desired size. No need to use the camera intrinsics. Example: you have the image of a building with rectangular windows, and you know the width/height ratio of these windows in the world.

Sometimes you can't find rectangles, but your camera is calibrated, and thus the problem you describe comes into play. Let's do the math. Assume the plane you are observing in the given image is Z=0 in world coordinates. Let K be the 3x3 intrinsic camera matrix and [R, t] the 3x4 matrix representing the camera pose in XYZ world frame, so that if Pc and Pw represent the same 3D point respectively in camera and world coordinates, it is Pc = R*Pw + t = [R, t] * [Pw.T, 1].T, where .T means transposed. Then you can write the camera projection as:

s * p = K * [R, t] * [Pw.T, 1].T

where s is an arbitrary scale factor and p is the pixel that Pw projects onto. But if Pw=[X, Y, Z].T is on the Z=0 plane, the 3rd column of R only multiplies zeros, so we can ignore it. If we then denote with r1 and r2 the first two columns of R, we can rewrite the above equation as:

s * p = K * [r1, r2, t] * [X, Y, 1].T

But K * [r1, r2, t] is a 3x3 matrix that transforms points on a 3D plane to points on the camera plane, so it is a homography.

If the plane is not Z=0, you can repeat the same argument replacing [R, t] with [R, t] * inv([Rp, tp]), where [Rp, tp] is the coordinate transform that maps a frame on the plane, with the plane normal being the Z axis, to the world frame.

Finally, to obtain the bird's eye view, you select a rotation R whose third column (the components of the world's Z axis in camera frame) is opposite to the plane's normal.

Upvotes: 6

Related Questions