Reputation: 212
I have a calibrated stereo camera setup with the 11 DLT coefficients for each camera (coefficients estimated using the easyWand package that uses a 'wand' calibration object moved across the scene). 3D projection using the DLT method and the obtained coefficients works fine and produces sensible results (see figure below).
Obtained parabolic trajectory of a thrown object using DLT triangulation (plot has been rotated to align with gravity)
However, when I convert the DLT coefficients into a Projection matrix (P
, where x = P X
, where x
is the 2d pixel coordinates, P
is a 3x4 matrix and X
is a homogenous 4x1 matrix with object 3d coordinates included) - the 'shape' of the trajectory makes sense but the values don't (values close to ~0.01 for x,y,z).
Obtained trajectory using Projection matrix based method cv2.triangulatePoints
(plot has been rotated to align with gravity)
Can anyone please explain why the Projection matrix based method produces oddly scaled coordinates or what is being done wrong here?
The discrepancy doesn't make sense as the DLT coefficients and projection matrix are inter-convertible (e.g. see this link).
# The DLT coefficients for the two cameras
import numpy as np
import cv2
camera1_dlt = np.array([-26.618, -169.460, 57.650, 306.560, 24.733, 11.723, 169.830,
95.812, 0.220, -0.161, 0.110])
camera2_dlt = np.array([-75.615, -80.218, 43.704, 412.710, 1.655, 21.247, 110.420,
235.580, 0.061, -0.145, 0.112])
# The projection matrix was derived from the DLT coefficients using the function belo
def extract_P_from_dlt(dltcoefs):
'''function which recreates the Projection matrix
'''
# dltcoefs contains basically the correct matrix, it just needs to be
# reshaped correctly
all_coefs = np.concatenate((dltcoefs, np.array([1]))).flatten()
P = all_coefs.reshape((4,3)).T
return P
# The subset of common *undistorted* 2D points are given below here
cam1_pts = np.array([ [ 98.221, 358.293],
[ 142.927, 348.913],
[ 186.275, 344.950],
[ 225.152, 341.567]],dtype='float32')
cam2_pts = np.array([ [ 314.633, 195.970],
[ 335.168, 200.769],
[ 354.778, 207.820],
[ 372.543, 215.284]], dtype='float32')
# generate projection matrix
Pcam1 = extract_P_from_dlt(camera1_dlt)
Pcam2 = extract_P_from_dlt(camera2_dlt)
# Now get 3D positions using cv2.triangulatePositions
xyz_cv2 = []
for pt1, pt2 in zip(cam1_pts, cam2_pts):
pt1_homog, pt2_homog = (X.reshape(1,1,2) for X in [pt1, pt2])
position = cv2.triangulatePoints(Pcam1, Pcam2, pt1_homog, pt2_homog)
final_xyz = cv2.convertPointsFromHomogeneous(position.T).flatten()
xyz_cv2.append(final_xyz)
Upvotes: 0
Views: 875
Reputation: 212
UPDATED ANSWER - FEATURING A LESSON IN KNOWING IMAGE ORIGIN CONVENTIONS!
After a few back and forths with the developer of the easyWand package - it turns out the origin convention of the image plays a big role.
The DLT coefficients from the easyWand package were generated assuming a bottom-left origin. My input pixel data assumed a top-left origin. This difference in origin conventions lead to a wrong triangulation! There is NO need for any kind of normalisation!
Here is the final working code which provides the verified results:
# The DLT coefficients for the two cameras
import numpy as np
import cv2
camera1_dlt = np.array([-26.618, -169.460, 57.650, 306.560, 24.733, 11.723, 169.830,
95.812, 0.220, -0.161, 0.110])
camera2_dlt = np.array([-75.615, -80.218, 43.704, 412.710, 1.655, 21.247, 110.420,
235.580, 0.061, -0.145, 0.112])
# The projection matrix was derived from the DLT coefficients using the function below
def extract_P_from_dlt_v2(dltcoefs):
'''Change in order of placing dlt coefficients
into 3x4 matrix - NO normalisation
'''
dltcoefs = np.append(dltcoefs, 1)
dltcoefs = dltcoefs
P = dltcoefs.reshape(3,4)
return P
# The subset of common *undistorted* 2D points are given below here with
# top-left image origin (upper row is 0 and left column is 0)
cam1_pts = np.array([ [ 98.221, 358.293],
[ 142.927, 348.913],
[ 186.275, 344.950],
[ 225.152, 341.567]],dtype='float32')
cam2_pts = np.array([ [ 314.633, 195.970],
[ 335.168, 200.769],
[ 354.778, 207.820],
[ 372.543, 215.284]], dtype='float32')
# move the origin from upper left to bottom left
# Now bottom row is 0, left column is still as before
cam1_pts[:,1] = 511 - cam1_pts[:,1]
cam2_pts[:,1] = 511 - cam2_pts[:,1]
# generate projection matrix
Pcam11 = extract_P_from_dlt_v2(camera1_dlt)
Pcam22 = extract_P_from_dlt_v2(camera2_dlt)
# Now get 3D positions using cv2.triangulatePositions
xyz_cv2 = []
for pt1, pt2 in zip(cam1_pts, cam2_pts):
pt1_homog, pt2_homog = (X.reshape(1,1,2) for X in [pt1, pt2])
position = cv2.triangulatePoints(Pcam11, Pcam22, pt1_homog, pt2_homog)
final_xyz = cv2.convertPointsFromHomogeneous(position.T).flatten()
xyz_cv2.append(final_xyz)
# Now get 3D positions using cv2.triangulatePositions
xyz_cv2 = []
for pt1, pt2 in zip(cam1_pts, cam2_pts):
pt1_homog, pt2_homog = (X.reshape(1,1,2) for X in [pt1, pt2])
position = cv2.triangulatePoints(Pcam11, Pcam22, pt1_homog, pt2_homog)
final_xyz = cv2.convertPointsFromHomogeneous(position.T).flatten()
xyz_cv2.append(final_xyz)
xyz_cv2 = np.array(xyz_cv2).reshape(-1,3)
plt.figure()
a0 = plt.subplot(111, projection='3d')
plt.plot(xyz_cv2[:,0], xyz_cv2[:,2], xyz_cv2[:,1], '*') # the y and z need to be swapped
The output xyz coordinates match those produced by directly using the DLT reconstruction approach - which takes in the 11 DLT coefficients to triangulate object positions.
Upvotes: 0
Reputation: 594
The stacking of the coefficient should be of the below format.
[ L1 L2 L3 L4 ]
[ L5 L6 L7 L8 ]
[ L9 L10 L11 L12 ]
But your extract_P_from_dlt
gives,
[ L1 L4 L7 L10 ]
[ L2 L5 L8 L11]
[ L3 L6 L9 L12 ]
In addition instead of stacking 1 for L12 , stack 0. Here is the updated fucntion.
def extract_P_from_dlt(dltcoefs):
'''function which recreates the Projection matrix'''
dltcoefs = np.append(dltcoefs, 0)
norm = np.linalg.norm(dltcoefs)
dltcoefs = dltcoefs/norm
P = dltcoefs.reshape(3,4)
return P
Overall code,
# The DLT coefficients for the two cameras
import numpy as np
import cv2
camera1_dlt = np.array([-26.618, -169.460, 57.650, 306.560, 24.733, 11.723, 169.830,
95.812, 0.220, -0.161, 0.110])
camera2_dlt = np.array([-75.615, -80.218, 43.704, 412.710, 1.655, 21.247, 110.420,
235.580, 0.061, -0.145, 0.112])
# The projection matrix was derived from the DLT coefficients using the function belo
def extract_P_from_dlt(dltcoefs):
'''function which recreates the Projection matrix'''
dltcoefs = np.append(dltcoefs, 0)
norm = np.linalg.norm(dltcoefs)
dltcoefs = dltcoefs/norm
P = dltcoefs.reshape(3,4)
return P
# The subset of common *undistorted* 2D points are given below here
cam1_pts = np.array([ [ 98.221, 358.293],
[ 142.927, 348.913],
[ 186.275, 344.950],
[ 225.152, 341.567]],dtype='float32')
cam2_pts = np.array([ [ 314.633, 195.970],
[ 335.168, 200.769],
[ 354.778, 207.820],
[ 372.543, 215.284]], dtype='float32')
# generate projection matrix
Pcam1 = extract_P_from_dlt(camera1_dlt)
Pcam2 = extract_P_from_dlt(camera2_dlt)
# Now get 3D positions using cv2.triangulatePositions
xyz_cv2 = []
for pt1, pt2 in zip(cam1_pts, cam2_pts):
pt1_homog, pt2_homog = (X.reshape(1,1,2) for X in [pt1, pt2])
position = cv2.triangulatePoints(Pcam1, Pcam2, pt1_homog, pt2_homog)
final_xyz = cv2.convertPointsFromHomogeneous(position.T).flatten()
xyz_cv2.append(final_xyz)
Produces the below image. I am not sure about the normalization part.
Upvotes: 1