Reputation: 325
Last couple of weeks I've been working on developing a simple proof-of-concept application in which a 3D model is projected over a specific Augmented Reality marker (in my case I am using Aruco markers) in IOS (with Swift and Objective-C)
I calibrated an Ipad Camera with a specific fixed lens position and used that to estimate the pose of the AR marker (which from my debug analysis seem pretty accurate). The problem seems (surprise, surprise) when I try to use SceneKit scene to project a model over the marker.
I am aware that the axis in opencv and SceneKit are different (Y and Z) and already done this correction as well as the row order/column order difference between the two libraries.
After constructing the projection matrix, I apply that same transform to the 3D model and from my debug analysis the object seems to be translated to the desired position and with the desired rotation. The problem is that it never does overlap the specific image pixel position of the marker. I am using a AVCapturePreviewVideoLayer as to put the video in background that has the same bounds as my SceneKit View.
Has anyone has a clue why this happens? I tried to play with cameras FOV's but with no real impact in the results.
Thank you all for your time.
EDIT1: I Will post some of the code here to reveal what I am currently doing.
I have two subviews inside the main view, one which is a background AVCaptureVideoPreviewLayer and another which is a SceneKitView. Both have the same bounds as the main view.
At each frame I use an opencv wrapper which outputs the pose of each marker:
std::vector<int> ids;
std::vector<std::vector<cv::Point2f>> corners, rejected;
cv::aruco::detectMarkers(frame, _dictionary, corners, ids, _detectorParams, rejected);
if (ids.size() > 0 ){
cv::aruco::drawDetectedMarkers(frame, corners, ids);
cv::Mat rvecs, tvecs;
cv::aruco::estimatePoseSingleMarkers(corners, 2.6, _intrinsicMatrix, _distCoeffs, rvecs, tvecs);
// Let's protect ourselves agains multiple markers
if (rvecs.total() > 1)
return;
_markerFound = true;
cv::Rodrigues(rvecs, _currentR);
_currentT = tvecs;
for (int row = 0; row < _currentR.rows; row++){
for (int col = 0; col < _currentR.cols; col++){
_currentExtrinsics.at<double>(row, col) = _currentR.at<double>(row, col);
}
_currentExtrinsics.at<double>(row, 3) = _currentT.at<double>(row);
}
_currentExtrinsics.at<double>(3,3) = 1;
std::cout << tvecs << std::endl;
// Convert coordinate systems of opencv to openGL (SceneKit)
// Note that in openCV z goes away the camera (in openGL goes into the camera)
// and y points down and on openGL point up
// Another note: openCV has a column order matrix representation, while SceneKit
// has a row order matrix, but we'll take care of it later.
cv::Mat cvToGl = cv::Mat::zeros(4, 4, CV_64F);
cvToGl.at<double>(0,0) = 1.0f;
cvToGl.at<double>(1,1) = -1.0f; // invert the y axis
cvToGl.at<double>(2,2) = -1.0f; // invert the z axis
cvToGl.at<double>(3,3) = 1.0f;
_currentExtrinsics = cvToGl * _currentExtrinsics;
cv::aruco::drawAxis(frame, _intrinsicMatrix, _distCoeffs, rvecs, tvecs, 5);
Then in each frame I convert the opencv matrix for a SCN4Matrix:
- (SCNMatrix4) transformToSceneKit:(cv::Mat&) openCVTransformation{
SCNMatrix4 mat = SCNMatrix4Identity;
// Transpose
openCVTransformation = openCVTransformation.t();
// copy the rotationRows
mat.m11 = (float) openCVTransformation.at<double>(0, 0);
mat.m12 = (float) openCVTransformation.at<double>(0, 1);
mat.m13 = (float) openCVTransformation.at<double>(0, 2);
mat.m14 = (float) openCVTransformation.at<double>(0, 3);
mat.m21 = (float)openCVTransformation.at<double>(1, 0);
mat.m22 = (float)openCVTransformation.at<double>(1, 1);
mat.m23 = (float)openCVTransformation.at<double>(1, 2);
mat.m24 = (float)openCVTransformation.at<double>(1, 3);
mat.m31 = (float)openCVTransformation.at<double>(2, 0);
mat.m32 = (float)openCVTransformation.at<double>(2, 1);
mat.m33 = (float)openCVTransformation.at<double>(2, 2);
mat.m34 = (float)openCVTransformation.at<double>(2, 3);
//copy the translation row
mat.m41 = (float)openCVTransformation.at<double>(3, 0);
mat.m42 = (float)openCVTransformation.at<double>(3, 1)+2.5;
mat.m43 = (float)openCVTransformation.at<double>(3, 2);
mat.m44 = (float)openCVTransformation.at<double>(3, 3);
return mat;
}
At each frame in which the AR marker is found I add a box to the scene and apply the transformation to the object node:
SCNBox *box = [SCNBox boxWithWidth:5.0 height:5.0 length:5.0 chamferRadius:0.0];
_boxNode = [SCNNode nodeWithGeometry:box];
if (found){
[self.delegate returnExtrinsicsMat:extrinsicMatrixOfTheMarker];
Mat R, T;
[self.delegate returnRotationMat:R];
[self.delegate returnTranslationMat:T];
SCNMatrix4 Transformation;
Transformation = [self transformToSceneKit:extrinsicMatrixOfTheMarker];
//_cameraNode.transform = SCNMatrix4Invert(Transformation);
[_sceneKitScene.rootNode addChildNode:_cameraNode];
//_cameraNode.camera.projectionTransform = SCNMatrix4Identity;
//_cameraNode.camera.zNear = 0.0;
_sceneKitView.pointOfView = _cameraNode;
_boxNode.transform = Transformation;
[_sceneKitScene.rootNode addChildNode:_boxNode];
//_boxNode.position = SCNVector3Make(Transformation.m41, Transformation.m42, Transformation.m43);
std::cout << (_boxNode.position.x) << " " << (_boxNode.position.y) << " " << (_boxNode.position.z) << std::endl << std::endl;
}
For example if the translation vector is (-1, 5, 20) the object appears in the scene in position (-1, -5, -20) in the scene, and the rotation is correct also. The problem is that it never appears in the correct position in the background image. I will add some images to show the result.
Does anyone know why this is happening?
Upvotes: 3
Views: 1224
Reputation: 325
Found out the solution. Instead of applying the transform to the node of the object I applied the inverted transformation matrix to the camera node. Then for the camera perspective transform matrix I applied the following matrix:
projection = SCNMatrix4Identity
projection.m11 = (2 * (float)(cameraMatrix[0])) / -(ImageWidth*0.5)
projection.m12 = (-2 * (float)(cameraMatrix[1])) / (ImageWidth*0.5)
projection.m13 = (width - (2 * Float(cameraMatrix[2]))) / (ImageWidth*0.5)
projection.m22 = (2 * (float)(cameraMatrix[4])) / (ImageHeight*0.5)
projection.m23 = (-height + (2 * (float)(cameraMatrix[5]))) / (ImageHeight*0.5)
projection.m33 = (-far - near) / (far - near)
projection.m34 = (-2 * far * near) / (far - near)
projection.m43 = -1
projection.m44 = 0
being far and near the z clipping planes.
I also had to correct the box initial position to center it on the marker.
Upvotes: 3