Thejasvi
Thejasvi

Reputation: 212

3d triangulation working with DLT but not with projection matrix using cv2.triangulatePoints

Outline

I have a calibrated stereo camera setup with the 11 DLT coefficients for each camera (coefficients estimated using the easyWand package that uses a 'wand' calibration object moved across the scene). 3D projection using the DLT method and the obtained coefficients works fine and produces sensible results (see figure below).

Obtained parabolic trajectory of a thrown object using DLT triangulation (plot has been rotated to align with gravity)

However, when I convert the DLT coefficients into a Projection matrix (P, where x = P X, where x is the 2d pixel coordinates, P is a 3x4 matrix and X is a homogenous 4x1 matrix with object 3d coordinates included) - the 'shape' of the trajectory makes sense but the values don't (values close to ~0.01 for x,y,z).

enter image description here

Obtained trajectory using Projection matrix based method cv2.triangulatePoints(plot has been rotated to align with gravity)

Can anyone please explain why the Projection matrix based method produces oddly scaled coordinates or what is being done wrong here?

The discrepancy doesn't make sense as the DLT coefficients and projection matrix are inter-convertible (e.g. see this link).

Data and code to replicate the results:

# The DLT coefficients for the two cameras 
import numpy as np 
import cv2 
camera1_dlt = np.array([-26.618, -169.460,  57.650,  306.560,  24.733,  11.723,  169.830,
        95.812,  0.220, -0.161,  0.110])
camera2_dlt = np.array([-75.615, -80.218,  43.704,  412.710,  1.655,  21.247,  110.420,
        235.580,  0.061, -0.145,  0.112])

# The projection matrix was derived from the DLT coefficients using the function belo

def extract_P_from_dlt(dltcoefs):
    '''function which recreates the Projection matrix
    '''
    
    # dltcoefs contains basically the correct matrix, it just needs to be
    # reshaped correctly
    all_coefs = np.concatenate((dltcoefs, np.array([1]))).flatten()
    P = all_coefs.reshape((4,3)).T
    return P

# The subset of common *undistorted* 2D points are given below here
cam1_pts = np.array([ [ 98.221,  358.293],
                      [ 142.927,  348.913],
                      [ 186.275,  344.950],
                      [ 225.152,  341.567]],dtype='float32')

cam2_pts = np.array([ [ 314.633,  195.970],
                      [ 335.168,  200.769],
                      [ 354.778,  207.820],
                      [ 372.543,  215.284]], dtype='float32')

# generate projection matrix 
Pcam1 = extract_P_from_dlt(camera1_dlt)
Pcam2 = extract_P_from_dlt(camera2_dlt)

# Now get 3D positions using cv2.triangulatePositions
xyz_cv2 = []
for pt1, pt2 in zip(cam1_pts, cam2_pts):
    pt1_homog, pt2_homog = (X.reshape(1,1,2) for X in [pt1, pt2])
    position = cv2.triangulatePoints(Pcam1, Pcam2, pt1_homog, pt2_homog)
    final_xyz = cv2.convertPointsFromHomogeneous(position.T).flatten()
    xyz_cv2.append(final_xyz)

Upvotes: 0

Views: 875

Answers (2)

Thejasvi
Thejasvi

Reputation: 212

UPDATED ANSWER - FEATURING A LESSON IN KNOWING IMAGE ORIGIN CONVENTIONS!

After a few back and forths with the developer of the easyWand package - it turns out the origin convention of the image plays a big role.

The DLT coefficients from the easyWand package were generated assuming a bottom-left origin. My input pixel data assumed a top-left origin. This difference in origin conventions lead to a wrong triangulation! There is NO need for any kind of normalisation!

Here is the final working code which provides the verified results:


# The DLT coefficients for the two cameras 
import numpy as np 
import cv2 

camera1_dlt = np.array([-26.618, -169.460,  57.650,  306.560,  24.733,  11.723,  169.830,
        95.812,  0.220, -0.161,  0.110])
camera2_dlt = np.array([-75.615, -80.218,  43.704,  412.710,  1.655,  21.247,  110.420,
        235.580,  0.061, -0.145,  0.112])

# The projection matrix was derived from the DLT coefficients using the function below

def extract_P_from_dlt_v2(dltcoefs):
    '''Change in order of placing dlt coefficients 
    into 3x4 matrix - NO normalisation
    '''
    dltcoefs = np.append(dltcoefs, 1)
    dltcoefs = dltcoefs

    P = dltcoefs.reshape(3,4)
    return P


# The subset of common *undistorted* 2D points are given below here with 
# top-left image origin (upper row is 0 and left column is 0)
cam1_pts = np.array([ [ 98.221,  358.293],
                      [ 142.927,  348.913],
                      [ 186.275,  344.950],
                      [ 225.152,  341.567]],dtype='float32')

cam2_pts = np.array([ [ 314.633,  195.970],
                      [ 335.168,  200.769],
                      [ 354.778,  207.820],
                      [ 372.543,  215.284]], dtype='float32')

# move the origin from upper left to bottom left
# Now bottom row is 0, left column is still as before
cam1_pts[:,1] = 511 - cam1_pts[:,1]
cam2_pts[:,1] = 511 - cam2_pts[:,1]

# generate projection matrix 
Pcam11 = extract_P_from_dlt_v2(camera1_dlt)
Pcam22 = extract_P_from_dlt_v2(camera2_dlt)

# Now get 3D positions using cv2.triangulatePositions
xyz_cv2 = []
for pt1, pt2 in zip(cam1_pts, cam2_pts):
    pt1_homog, pt2_homog = (X.reshape(1,1,2) for X in [pt1, pt2])
    position = cv2.triangulatePoints(Pcam11, Pcam22, pt1_homog, pt2_homog)
    final_xyz = cv2.convertPointsFromHomogeneous(position.T).flatten()
    xyz_cv2.append(final_xyz)


# Now get 3D positions using cv2.triangulatePositions
xyz_cv2 = []
for pt1, pt2 in zip(cam1_pts, cam2_pts):
    pt1_homog, pt2_homog = (X.reshape(1,1,2) for X in [pt1, pt2])
    position = cv2.triangulatePoints(Pcam11, Pcam22, pt1_homog, pt2_homog)
    final_xyz = cv2.convertPointsFromHomogeneous(position.T).flatten()
    xyz_cv2.append(final_xyz)

xyz_cv2 = np.array(xyz_cv2).reshape(-1,3)

plt.figure()
a0 = plt.subplot(111, projection='3d')
plt.plot(xyz_cv2[:,0], xyz_cv2[:,2], xyz_cv2[:,1], '*') # the y and z need to be swapped 

The output xyz coordinates match those produced by directly using the DLT reconstruction approach - which takes in the 11 DLT coefficients to triangulate object positions.

Upvotes: 0

cyborg
cyborg

Reputation: 594

The stacking of the coefficient should be of the below format.

[ L1  L2  L3  L4 ]
[ L5  L6  L7  L8 ]
[ L9 L10 L11 L12 ]

But your extract_P_from_dlt gives,

[ L1  L4  L7  L10 ]
[ L2  L5  L8  L11]
[ L3  L6  L9  L12 ]

In addition instead of stacking 1 for L12 , stack 0. Here is the updated fucntion.

def extract_P_from_dlt(dltcoefs):
    '''function which recreates the Projection matrix'''
    dltcoefs = np.append(dltcoefs, 0)
    norm = np.linalg.norm(dltcoefs)
    dltcoefs = dltcoefs/norm

    P = dltcoefs.reshape(3,4)
    return P

Overall code,

# The DLT coefficients for the two cameras 
import numpy as np 
import cv2 

camera1_dlt = np.array([-26.618, -169.460,  57.650,  306.560,  24.733,  11.723,  169.830,
        95.812,  0.220, -0.161,  0.110])
camera2_dlt = np.array([-75.615, -80.218,  43.704,  412.710,  1.655,  21.247,  110.420,
        235.580,  0.061, -0.145,  0.112])

# The projection matrix was derived from the DLT coefficients using the function belo

def extract_P_from_dlt(dltcoefs):
    '''function which recreates the Projection matrix'''
    dltcoefs = np.append(dltcoefs, 0)
    norm = np.linalg.norm(dltcoefs)
    dltcoefs = dltcoefs/norm

    P = dltcoefs.reshape(3,4)
    return P


# The subset of common *undistorted* 2D points are given below here
cam1_pts = np.array([ [ 98.221,  358.293],
                      [ 142.927,  348.913],
                      [ 186.275,  344.950],
                      [ 225.152,  341.567]],dtype='float32')

cam2_pts = np.array([ [ 314.633,  195.970],
                      [ 335.168,  200.769],
                      [ 354.778,  207.820],
                      [ 372.543,  215.284]], dtype='float32')

# generate projection matrix 
Pcam1 = extract_P_from_dlt(camera1_dlt)
Pcam2 = extract_P_from_dlt(camera2_dlt)

# Now get 3D positions using cv2.triangulatePositions
xyz_cv2 = []
for pt1, pt2 in zip(cam1_pts, cam2_pts):
    pt1_homog, pt2_homog = (X.reshape(1,1,2) for X in [pt1, pt2])
    position = cv2.triangulatePoints(Pcam1, Pcam2, pt1_homog, pt2_homog)
    final_xyz = cv2.convertPointsFromHomogeneous(position.T).flatten()
    xyz_cv2.append(final_xyz)

Produces the below image. I am not sure about the normalization part. enter image description here

Upvotes: 1

Related Questions