Reputation: 395
I am attempting to find a simple way in SceneKit to calculate the depth of a pixels in SceneKit and LiDAR data from
sceneView.session.currentFrame?.smoothedSceneDepth?.depthMap
Ideally I don't want to use metal shaders. I would prefer find a points in my currentFrame
and their corresponding depth map, to get the depth of a points in SceneKit (ideally in world coordinates, not just local to that frustum at that point in time).
Fast performance isn't necessary as it won't be calculated at capture.
I am aware of the Apple project at link, however this is far too complex for my needs.
As a starting point, my code works like this:
guard let depthData = frame.sceneDepth else { return }
let camera = frame.camera
let depthPixelBuffer = depthData.depthMap
let depthHeight = CVPixelBufferGetHeight(depthPixelBuffer)
let depthWidth = CVPixelBufferGetWidth(depthPixelBuffer)
let resizeScale = CGFloat(depthWidth) / CGFloat(CVPixelBufferGetWidth(frame.capturedImage))
let resizedColorImage = frame.capturedImage.toCGImage(scale: resizeScale);
guard let colorData = resizedColorImage.pixelData() else {
fatalError()
}
var intrinsics = camera.intrinsics;
let referenceDimensions = camera.imageResolution;
let ratio = Float(referenceDimensions.width) / Float(depthWidth)
intrinsics.columns.0[0] /= ratio
intrinsics.columns.1[1] /= ratio
intrinsics.columns.2[0] /= ratio
intrinsics.columns.2[1] /= ratio
var points: [SCNVector3] = []
let depthValues = depthPixelBuffer.depthValues()
for vv in 0..<depthHeight {
for uu in 0..<depthWidth {
let z = -depthValues[uu + vv * depthWidth]
let x = Float32(uu) / Float32(depthWidth) * 2.0 - 1.0;
let y = 1.0 - Float32(vv) / Float32(depthHeight) * 2.0;
points.append(SCNVector3(x, y, z))
}
}
The resulting point cloud looks ok, but is severely bent on the Z-axis. I realize this code is also not adjusting for screen orientation either.
Upvotes: 4
Views: 1155
Reputation: 395
Cupertino kindly got back to me with this response on the forums at developer.apple.com:
The unprojection calculation itself is going to be identical, regardless of whether it is done CPU side or GPU side.
CPU side, the calculation would look something like this:
/// Returns a world space position given a point in the camera image, the eye space depth (sampled/read from the corresponding point in the depth image), the inverse camera intrinsics, and the inverse view matrix.
func worldPoint(cameraPoint: SIMD2<Float>, eyeDepth: Float, cameraIntrinsicsInversed: simd_float3x3, viewMatrixInversed: simd_float4x4) -> SIMD3<Float> {
let localPoint = cameraIntrinsicsInversed * simd_float3(cameraPoint, 1) * -eyeDepth
let worldPoint = viewMatrixInversed * simd_float4(localPoint, 1);
return (worldPoint / worldPoint.w)[SIMD3(0,1,2)];
}
Implemented, this looks like
for vv in 0..<depthHeight {
for uu in 0..<depthWidth {
let z = -depthValues[uu + vv * depthWidth]
let viewMatInverted = (sceneView.session.currentFrame?.camera.viewMatrix(for: UIApplication.shared.statusBarOrientation))!.inverse
let worldPoint = worldPoint(cameraPoint: SIMD2(Float(uu), Float(vv)), eyeDepth: z, cameraIntrinsicsInversed: intrinsics.inverse, viewMatrixInversed: viewMatInverted * rotateToARCamera )
points.append(SCNVector3(worldPoint))
}
}
The point cloud is pretty messy, needs confidence worked out, and there are gaps vertically where Int rounding has occurred, but it's a solid start. Missing functions come from the link to the Apple demo project in the question above.
Upvotes: 3