socratic
socratic

Reputation: 339

Error in gauss-newton implementation for pose optimization

I’m using a modified version of a gauss-newton method to refine a pose estimate using OpenCV. The unmodified code can be found here: http://people.rennes.inria.fr/Eric.Marchand/pose-estimation/tutorial-pose-gauss-newton-opencv.html

The details of this approach are outlined in the corresponding paper:

Marchand, Eric, Hideaki Uchiyama, and Fabien Spindler. "Pose estimation for augmented reality: a hands-on survey." IEEE transactions on visualization and computer graphics 22.12 (2016): 2633-2651.

A PDF can be found here: https://hal.inria.fr/hal-01246370/document

The part that is relevant (Pages 4 and 5) are screencapped below:

Gauss-Newton Minimization

Here is what I have done. First, I’ve (hopefully) “corrected” some errors: (a) dt and dR can be passed by reference to exponential_map() (even though cv::Mat is essentially a pointer). (b) The last entry of each 2x6 Jacobian matrix, J.at<double>(i*2+1,5), was -x[i].y but should be -x[i].x. (c) I’ve also tried using a different formula for the projection. Specifically, one that includes the focal length and principal point:

xq.at<double>(i*2,0)   = cx + fx * cX.at<double>(0,0) / cX.at<double>(2,0);
xq.at<double>(i*2+1,0) = cy + fy * cX.at<double>(1,0) / cX.at<double>(2,0);

Here is the relevant code I am using, in its entirety (control starts at optimizePose3()):

void exponential_map(const cv::Mat &v, cv::Mat &dt, cv::Mat &dR)
{
  double vx = v.at<double>(0,0);
  double vy = v.at<double>(1,0);
  double vz = v.at<double>(2,0);
  double vtux = v.at<double>(3,0);
  double vtuy = v.at<double>(4,0);
  double vtuz = v.at<double>(5,0);
  cv::Mat tu = (cv::Mat_<double>(3,1) << vtux, vtuy, vtuz); // theta u
  cv::Rodrigues(tu, dR);
  double theta = sqrt(tu.dot(tu));
  double sinc = (fabs(theta) < 1.0e-8) ? 1.0 : sin(theta) / theta;
  double mcosc = (fabs(theta) < 2.5e-4) ? 0.5 : (1.-cos(theta)) / theta / theta;
  double msinc = (fabs(theta) < 2.5e-4) ? (1./6.) : (1.-sin(theta)/theta) / theta / theta;
  dt.at<double>(0,0) = vx*(sinc + vtux*vtux*msinc)
                     + vy*(vtux*vtuy*msinc - vtuz*mcosc)
                     + vz*(vtux*vtuz*msinc + vtuy*mcosc);
  dt.at<double>(1,0) = vx*(vtux*vtuy*msinc + vtuz*mcosc)
                     + vy*(sinc + vtuy*vtuy*msinc)
                     + vz*(vtuy*vtuz*msinc - vtux*mcosc);
  dt.at<double>(2,0) = vx*(vtux*vtuz*msinc - vtuy*mcosc)
                     + vy*(vtuy*vtuz*msinc + vtux*mcosc)
                     + vz*(sinc + vtuz*vtuz*msinc);
}


void optimizePose3(const PoseEstimation &pose, 
                   std::vector<FeatureMatch> &feature_matches,
                   PoseEstimation &optimized_pose) {

  //Set camera parameters
  double fx = camera_matrix.at<double>(0, 0); //Focal length
  double fy = camera_matrix.at<double>(1, 1);
  double cx = camera_matrix.at<double>(0, 2); //Principal point
  double cy = camera_matrix.at<double>(1, 2);


  auto inlier_matches = getInliers(pose, feature_matches);

  std::vector<cv::Point3d> wX;
  std::vector<cv::Point2d> x;

  const unsigned int npoints = inlier_matches.size();
  cv::Mat J(2*npoints, 6, CV_64F);
  double lambda = 0.25;
  cv::Mat xq(npoints*2, 1, CV_64F);
  cv::Mat xn(npoints*2, 1, CV_64F);
  double residual=0, residual_prev;
  cv::Mat Jp;


  for(auto i = 0u; i < npoints; i++) {

    //Model points
    const cv::Point2d &M = inlier_matches[i].model_point();

    wX.emplace_back(M.x, M.y, 0.0);


    //Imaged points
    const cv::Point2d &I = inlier_matches[i].image_point();

    xn.at<double>(i*2,0)   = I.x; // x
    xn.at<double>(i*2+1,0) = I.y; // y

    x.push_back(I);
  }


  //Initial estimation
  cv::Mat cRw = pose.rotation_matrix;
  cv::Mat ctw = pose.translation_vector;

  int nIters = 0;

  // Iterative Gauss-Newton minimization loop
  do {
    for (auto i = 0u; i < npoints; i++) {

      cv::Mat cX = cRw * cv::Mat(wX[i]) + ctw;            // Update cX, cY, cZ


      // Update x(q)
      //xq.at<double>(i*2,0)   = cX.at<double>(0,0) / cX.at<double>(2,0);    // x(q) = cX/cZ
      //xq.at<double>(i*2+1,0) = cX.at<double>(1,0) / cX.at<double>(2,0);    // y(q) = cY/cZ
      xq.at<double>(i*2,0)   = cx + fx * cX.at<double>(0,0) / cX.at<double>(2,0);
      xq.at<double>(i*2+1,0) = cy + fy * cX.at<double>(1,0) / cX.at<double>(2,0);

      // Update J using equation (11)
      J.at<double>(i*2,0) = -1 / cX.at<double>(2,0);        // -1/cZ
      J.at<double>(i*2,1) = 0; 
      J.at<double>(i*2,2) = x[i].x / cX.at<double>(2,0);    // x/cZ
      J.at<double>(i*2,3) = x[i].x * x[i].y;                // xy
      J.at<double>(i*2,4) = -(1 + x[i].x * x[i].x);         // -(1+x^2)
      J.at<double>(i*2,5) = x[i].y;                         // y

      J.at<double>(i*2+1,0) = 0;
      J.at<double>(i*2+1,1) = -1 / cX.at<double>(2,0);      // -1/cZ
      J.at<double>(i*2+1,2) = x[i].y / cX.at<double>(2,0);  // y/cZ
      J.at<double>(i*2+1,3) = 1 + x[i].y * x[i].y;          // 1+y^2
      J.at<double>(i*2+1,4) = -x[i].x * x[i].y;             // -xy
      J.at<double>(i*2+1,5) = -x[i].x;                      // -x
    }



    cv::Mat e_q = xq - xn;                     // Equation (7)


    cv::Mat Jp = J.inv(cv::DECOMP_SVD);        // Compute pseudo inverse of the Jacobian

    cv::Mat dq = -lambda * Jp * e_q;           // Equation (10)

    cv::Mat dctw(3, 1, CV_64F), dcRw(3, 3, CV_64F);
    exponential_map(dq, dctw, dcRw);

    cRw = dcRw.t() * cRw;                      // Update the pose
    ctw = dcRw.t() * (ctw - dctw);

    residual_prev = residual;                  // Memorize previous residual
    residual = e_q.dot(e_q);                   // Compute the actual residual

    std::cout << "residual_prev: " << residual_prev << std::endl;
    std::cout << "residual: " << residual << std::endl << std::endl;

    nIters++;

  } while (fabs(residual - residual_prev) > 0);
  //} while (nIters < 30);

  optimized_pose.rotation_matrix = cRw;
  optimized_pose.translation_vector = ctw;
  cv::Rodrigues(optimized_pose.rotation_matrix, optimized_pose.rotation_vector); 
}

Even when I use the functions as given, it does not produce the correct results. My initial pose estimate is very close to optimal, but when I try run the program, the method takes a very long time to converge - and when it does, the results are very wrong. I’m not sure what could be wrong and I’m out of ideas. I’m confident my inliers are actually inliers (they were chosen using an M-estimator). I’ve compared the results from exponential map with those from other implementations, and they seem to agree.

So, where is the error in this gauss-newton implementation for pose optimization? I’ve tried to make things as easy as possible for anyone willing to lend a hand. Let me know if there is anymore information I can provide. Any help would be greatly appreciated. Thanks.

Upvotes: 1

Views: 1075

Answers (1)

Catree
Catree

Reputation: 2517

Edit: 2019/05/13

There is now solvePnPRefineVVS function in OpenCV.

Also, you should use x and y calculated from the current estimated pose instead.


In the cited paper, they expressed the measurements x in the normalized camera frame (at z=1).

When working with real data, you have:

  • (u,v): 2D image coordinates (e.g. keypoints, corner locations, etc.)
  • K: the intrinsic parameters (obtained after calibrating the camera)
  • D: the distortion coefficients (obtained after calibrating the camera)

To compute the 2D image coordinates in the normalized camera frame, you can use in OpenCV the function cv::undistortPoints() (link to my answer about cv::projectPoints() and cv::undistortPoints()).

When there is no distortion, the computation (also called "reverse perspective transformation") is:

  • x = (u - cx) / fx
  • y = (v - cy) / fy

Upvotes: 3

Related Questions