Reputation: 306
I am having problems understanding the functionality of cv2.recoverPose(points1, points2)
. As of my understanding of the documentation, this function should return the rotation matrix from camera1 to camera2 as well as the translation from camera1 to camera2.
I am using an artificial set of points, points_3d
, as well as the transformation between two cameras, T_1_0
, to recover the camera transformation in Python. To get the points on the cameras, I calculate the projections of the 3D points onto the camera sensors:
import cv2
import numpy as np
def calc_projection(K_c, transform, pt_3d):
p_projected = np.hstack((K_c, np.zeros((3,1)))) @ np.linalg.inv(np.vstack((transform, [0,0,0,1]))) @ np.vstack((pt_3d.reshape(3,1), 1))
p_projected = p_projected[:2,:] / p_projected[2,:]
p_projected = p_projected.ravel()
return p_projected
points_3d = np.random.rand(100, 3)
K_c = np.eye(3)
T_0 = np.hstack((np.eye(3), np.zeros((3,1))))
rot_vec = np.array([0.2, 0.1, 0.3])
R_1_0, _ = cv2.Rodrigues(np.array(rot_vec))
t_0_10 = np.array([0.2, 0.4, 0.1])
T_1_0 = np.hstack((R_1_0, t_0_10.reshape(3,1)))
points1 = []
points2 = []
for pt_3d in points_3d:
points1.append(calc_projection(K_c, T_0, pt_3d))
points2.append(calc_projection(K_c, T_1_0, pt_3d))
points1 = np.array(points1)
points2 = np.array(points2)
E, mask = cv2.findEssentialMat(points1, points2, method=cv2.RANSAC)
inliers1 = points1[mask]
inliers2 = points2[mask]
_, R, t, _ = cv2.recoverPose(E, inliers1, inliers2)
r, _ = cv2.Rodrigues(R)
assert np.allclose(r, rot_vec)
assert np.allclose(t, t_0_10)
I would expect the result to be equal to T_1_0
(as of the assertion) but the result is:
r = [[0.20329041]
[0.15711541]
[0.37188371]]
t = [[0.50969714]
[0.79593836]
[0.32663581]]
What am I missing here? Why is it not working as expected? Am I doing something wrong or what is the expected behavior here?
The formula I've used for the projection wrongly introduces the inverse of the transformation. It should be the following instead:
p_projected = np.hstack((K_c, np.zeros((3,1)))) @ np.vstack((transform, [0,0,0,1])) @ np.vstack((pt_3d.reshape(3,1), 1))
solvePnP
also solves the problem that I've been trying to solve here (3D->2D projection)
_, r, t = cv2.solvePnP(points_3d, points2, K_c, None)
assert np.allclose(r, rot_vec.reshape(3, 1), rtol=0.15)
assert np.allclose(t, t_0_10.reshape(3, 1), rtol=0.15)
But I still don't know why cv2.recoverPose
does not work? Looking at the documentation it should also return the translation and rotation...
Upvotes: 2
Views: 2418
Reputation: 416
This has been tricky to get right. There are so many conflicting explanations that I felt it best to create a dummy script. This helped me understand exactly what was going on. The CV2 documentation is not descriptive enough and doesn't help.
Let 3D points in the world coordinate system (CS) or camera 1's CS be defined as X1
. In a 2 camera system, we can only defined camera's wrt each other. For simplicity, we generally assumed camera 1's CS to be the world CS.
Let 3D points in camera 2's coordinate system be defined as X2
.
Then, X2 = R @ X1 + t
. Here, R
and t
are recovered from cv2.recoverPose
.
If you want to get camera 2's center and orientation in camera 1 CS, you need R_cam2 = -inv(R) @ t
and t_cam2 = R.T @ t
Please feel free to use my script and convince yourself
import numpy as np
import cv2
from scipy.spatial.transform import Rotation as R
import matplotlib.pyplot as plt
if __name__ == '__main__':
# Generating a random point cloud within specified spatial limits
num_points = 100 # Reduced number of points for clarity
points_3d = np.random.rand(num_points, 3) # Random points in 3D
# Scale and shift points to fit within the specified range
points_3d[:, 0] = points_3d[:, 0] * 1000 - 500 # X: -500 to 500 mm
points_3d[:, 1] = points_3d[:, 1] * 1000 - 500 # Y: -500 to 500 mm
points_3d[:, 2] = points_3d[:, 2] * 4500 + 500 # Z: 500 to 5000 mm
# Defining camera intrinsic parameters
fov_x, fov_y = 80, 45 # Field of view in degrees for X and Y axes
resolution_x, resolution_y = 1920, 1080 # Camera resolution
# Focal lengths derived from field of view and sensor resolution
fx = resolution_x / (2 * np.tan(np.deg2rad(fov_x) / 2))
fy = resolution_y / (2 * np.tan(np.deg2rad(fov_y) / 2))
# Assuming the principal point is at the center of the image
cx, cy = resolution_x / 2, resolution_y / 2
intrinsic_matrix = np.array([
[fx, 0, cx],
[0, fy, cy],
[0, 0, 1] # Homogeneous component
])
# Camera 1 (Origin) - no rotation or translation
R1_identity = np.eye(3) # Identity rotation matrix
t1_zero = np.zeros((3, 1)) # Zero translation
# Camera 2 - positioned 50mm to the right and rotated -20 degrees around Y-axis
t2_translation = np.array([[50], [0], [0]]) # Translation vector
R2_rotation = R.from_euler('y', -20, degrees=True).as_matrix() # Rotation matrix
# Constructing projection matrices without considering intrinsic parameters for simplicity
P1_projection = np.dot(np.eye(3), np.hstack((R1_identity, t1_zero)))
P2_projection = np.dot(np.eye(3), np.hstack((R2_rotation, t2_translation)))
# Projecting 3D points to each camera's coordinate system
points_3d_homogeneous = np.hstack((points_3d, np.ones((num_points, 1)))) # Homogeneous coordinates
points_in_cam1 = (P1_projection @ points_3d_homogeneous.T).T
points_in_cam2 = (P2_projection @ points_3d_homogeneous.T).T
# Converting homogeneous coordinates to 2D image points
points_in_cam1_2d = points_in_cam1[:, :2] / points_in_cam1[:, 2, np.newaxis]
points_in_cam2_2d = points_in_cam2[:, :2] / points_in_cam2[:, 2, np.newaxis]
# Estimating the essential matrix from point correspondences
E_matrix, inliers = cv2.findEssentialMat(
points_in_cam1_2d, points_in_cam2_2d,
focal=1.0, pp=(0., 0.), method=cv2.LMEDS, prob=0.999, threshold=1.0
)
# Decomposing the essential matrix to find relative rotation and translation
_, recovered_R_matrix, recovered_t_vector, _ = cv2.recoverPose(E_matrix, points_in_cam1_2d, points_in_cam2_2d)
# Converting rotation matrices to Euler angles for easier interpretation
recovered_rotation_euler = R.from_matrix(recovered_R_matrix).as_euler('xyz', degrees=True)
groundtruth_rotation_euler = R.from_matrix(R2_rotation).as_euler('xyz', degrees=True)
print('Recovered Rotation (Euler angles):', recovered_rotation_euler)
print('Ground Truth Rotation (Euler angles):', groundtruth_rotation_euler)
# Calculating Camera 2's position relative to Camera 1
camera2_position = - np.linalg.inv(R2_rotation) @ t2_translation
camera2_orientation_in_world = R2_rotation.T # Orientation relative to the world
# Projection matrix of Camera 2 with intrinsic parameters
P2_with_intrinsic = np.dot(intrinsic_matrix, np.hstack((R2_rotation, t2_translation)))
M_orientation = P2_with_intrinsic[:, :3] # Orientation component of the projection matrix
principal_ray_direction = M_orientation[2, :] # Camera 2's principal ray
# Testing which rotation gives the correct principal ray direction
incorrect_principal_ray = R2_rotation @ np.array([0, 0, 1])
correct_principal_ray = R2_rotation.T @ np.array([0, 0, 1])
print('Camera 2 Principal Ray Direction:', principal_ray_direction)
print('Incorrect Principal Ray (Using R2):', incorrect_principal_ray)
print('Correct Principal Ray (Using R2.T):', correct_principal_ray)
Upvotes: 2
Reputation: 306
Most important findings:
The cv2.recoverPose(points1, points2,...)
function returns the rotation from camera1 to camera2 if points1 are found on camera1 and points2 are found on camera2.
The returned translation vector is also from camera1 to camera2 but in the coordinate frame of camera1.
The translation vector is found only up to a factor which can't be recovered without further logic.
Setting the cameraMatrix
on findEssentialMat
is important.
I am now able to use cv2.recoverPose
and recover the right rotation and translation. Here is the updated code:
E, mask = cv2.findEssentialMat(points1, points2, cameraMatrix=K_c)
inliers1 = points1[mask]
inliers2 = points2[mask]
_, R, t, _ = cv2.recoverPose(E, inliers1, inliers2, cameraMatrix=K_c)
r, _ = cv2.Rodrigues(R)
The results are
# As expected
r = [[0.2]
[0.1]
[0.3]]
# expected: [0.2, 0.4, 0.1], but is:
t = [[0.43643578]
[0.87287156]
[0.21821789]]
But! The documentation tells that the recovery of the translation is only possible up to a factor! So in this case, the following assertion works and the behavior is as expected!:
factors = t.ravel() / t_0_10
assert np.allclose(factors, factors[0])
Comment:
What's funny is that if I use K_c = np.eye(3)
it sometimes works and sometimes not. But if I use the intrinsics of a real camera, the assertion always is true...
K_c = np.array([
[485.0, 0.0, 320.0],
[0.0, 485.0, 240.0],
[0.0,0.0,1.0]
])
Upvotes: 2