How are the ARKit People Occlusion samples being done?

This may be an obscure question, but I see lots of very cool samples online of how people are using the new ARKit people occlusion technology in ARKit 3 to effectively "separate" the people from the background, and apply some sort of filtering to the "people" (see here).

In looking at Apple's provided source code and documentation, I see that I can retrieve the segmentationBuffer from an ARFrame, which I've done, like so;

func session(_ session: ARSession, didUpdate frame: ARFrame) {
    let image = frame.capturedImage
    if let segementationBuffer = frame.segmentationBuffer {

        // Get the segmentation's width
        let segmentedWidth = CVPixelBufferGetWidth(segementationBuffer)

        // Create the mask from that pixel buffer.
        let sementationMaskImage = CIImage(cvPixelBuffer: segementationBuffer, options: [:])

        // Smooth edges to create an alpha matte, then upscale it to the RGB resolution.
        let alphaUpscaleFactor = Float(CVPixelBufferGetWidth(image)) / Float(segmentedWidth)
        let alphaMatte = sementationMaskImage.clampedToExtent()
            .applyingFilter("CIGaussianBlur", parameters: ["inputRadius": 2.0)
            .cropped(to: sementationMaskImage.extent)
            .applyingFilter("CIBicubicScaleTransform", parameters: ["inputScale": alphaUpscaleFactor])

        // Unknown...

    }
}

In the "unknown" section, I am trying to determine how I would render my new "blurred" person on top of the original camera feed. There does not seem to be any methods to draw the new CIImage on "top" of the original camera feed, as the ARView has no way of being manually updated.

Upvotes: 3

Answers (2)

mnuages

Reputation: 13462

the Bringing People into AR WWDC session has some information, especially about ARMatteGenerator. The session also comes with a sample code.

Upvotes: 2

Andy Jazz

Reputation: 58113

In the following code snippet we see personSegmentationWithDepth type property for depth compositing (there are RGB, Alpha and Depth channels):

// Automatically segmenting and then compositing foreground (people), 
// middle-ground (3D model) and background.

let session = ARSession()

if let configuration = session.configuration as? ARWorldTrackingConfiguration {
    configuration.frameSemantics.insert(.personSegmentationWithDepth)
    session.run(configuration)
}

You can manually access a Depth Data of World Tracking in CVPixelBuffer (depth values for a performed segmentation):

let image = frame.estimatedDepthData

And you can manually access a Depth Data of Face Tracking in CVPixelBuffer (from TrueDepth camera):

let image = session.currentFrame?.capturedDepthData?.depthDataMap

Also, there's a generateDilatedDepth instance method in ARKit 3.0:

func generateDilatedDepth(from frame: ARFrame, 
                       commandBuffer: MTLCommandBuffer) -> MTLTexture

In your case you have to use estimatedDepthData because Apple documentation says:

It's a buffer that represents the estimated depth values from the camera feed that you use to occlude virtual content.

var estimatedDepthData: CVPixelBuffer? { get }

If you multiply DEPTH data from this buffer (at first you have to convert Depth channel to RGB) by RGB or ALPHA using compositing techniques and you'll get awesome effects.

Look at these 6 images: the lower row represents three RGB-images corrected with Depth channel: depth grading, depth blurring, depth point position pass.

Upvotes: 5

How are the ARKit People Occlusion samples being done?

Answers (2)

Related Questions