fant
fant

Reputation: 21

ARKit project point with previous device position

I'm combining ARKit with a CNN to constantly update ARKit nodes when they drift. So:

  1. Get estimate of node position with ARKit and place a virtual object in the world
  2. Use CNN to get its estimated 2D location of the object
  3. Update node position accordingly (to refine it's location in 3D space)

The problem is that #2 takes 0,3s or so. Therefore I can't use sceneView.unprojectPoint because the point will correspond to a 3D point from the device's world position from #1.

How do I calculate the 3D vector from my old location to the CNN's 2D point?

Upvotes: 2

Views: 1047

Answers (1)

rickster
rickster

Reputation: 126177

unprojectPoint is just a matrix-math convenience function similar to those found in many graphics-oriented libraries (like DirectX, old-style OpenGL, Three.js, etc). In SceneKit, it's provided as a method on the view, which means it operates using the model/view/projection matrices and viewport the view currently uses for rendering. However, if you know how that function works, you can implement it yourself.

An Unproject function typically does two things:

  1. Convert viewport coordinates (pixels) to the clip-space coordinate system (-1.0 to 1.0 in all directions).

  2. Reverse the projection transform (assuming some arbitrary Z value in clip space) and the view (camera) transform to get to 3D world-space coordinates.

Given that knowledge, we can build our own function. (Warning: untested.)

func unproject(screenPoint: float3, // see below for Z depth hint discussion
                 modelView: float4x4,
                projection: float4x4,
                  viewport: CGRect) -> float3 {

    // viewport to clip: subtract viewport origin, divide by size, 
    // scale/offset from 0...1 to -1...1 coordinate space
    let clip = (screenPoint - float3(viewport.x, viewport.y, 1.0))
               / float3(viewport.width, viewport.height, 1.0)
               * float3(2) - float3(1)
    // apply the reverse of the model-view-projection transform
    let inversePM = (projection * modelView).inverse
    let result = inversePM * float4(clip.x, clip.y, clip.z, 1.0)
    return float3(result.x, result.y, result.z) / result.w // perspective divide
}

Now, to use it... The modelView matrix you pass to this function is the inverse of ARCamera.transform, and you can also get projectionMatrix directly from ARCamera. So, if you're grabbing a 2D position at one point in time, grab the camera matrices then, too, so that you can work backward to 3D as of that time.

There's still the issue of that "Z depth hint" I mentioned: when the renderer projects 3D to 2D it loses information (one of those D's, actually). So you have to recover or guess that information when you convert back to 3D — the screenPoint you pass in to the above function is the x and y pixel coordinates, plus a depth value between 0 and 1. Zero is closer to the camera, 1 is farther away. How you make use of that sort of depends on how the rest of your algorithm is designed. (At the very least, you can unproject both Z=0 and Z=1, and you'll get the endpoints of line segment in 3D, with your original point somewhere along that line.)

Of course, whether this can actually be put together with your novel CNN-based approach is another question entirely. But at least you learned some useful 3D graphics math!

Upvotes: 5

Related Questions