AskIOS
AskIOS

Reputation: 11

Draw bounding box using iOS ARScnView

I'm using ARKit's ARSCNView to develop a prototype. The objective of the prototype is to scan different different products arranged in different racks in a shelf (assume that any provision store where chips are positioned in a big shelf and arranged in racks inside the shelf). We have developed a .mlpackage, which will provide the predictions of products and its associated bounding box coordinates for the given input image. We used ARSessionDelegate delegate's func session(_ session: ARSession, didUpdate frame: ARFrame) method to capture the live preview.

func session(_ session: ARSession, didUpdate frame: ARFrame) {
    guard let pixelBuffer = frame.capturedImage as CVPixelBuffer? else { return }
    DispatchQueue.global(qos: .userInitiated).async {
        do {
            let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .right, options: [:])
            try handler.perform([self.visionRequest])
        } catch {
            print("error in classify fame error \(error)")
        }
    }
}

we are using VisionKit's VNImageRequestHandler to process the requests like below.

lazy var visionRequest: VNCoreMLRequest = {
        let request = VNCoreMLRequest(model: mlModel, completionHandler: {
            [weak self] request, error in
          DispatchQueue.main.async {
              self?.processPredictions(predictions: request.results as? [VNRecognizedObjectObservation] ?? [])
          }
        })
        return request
}()

Here what we wanted to achieve is that, when the predictions given by visionRequest, we call processPredictions method to process. We expected to add a anchor to each of the prediction receievd from visionRequest. Following is the code implemented to achieve the same. convertToWorldPosition is a handy method which uses the hitTest method of ARView to provide the world transform vector for given ( which is nothing but a centroid of the each prediction bounding box.)

func processPredictions(predictions: [VNRecognizedObjectObservation]) {
        print("count \(predictions.count)")
        overlayLayer.sublayers?.forEach { $0.removeFromSuperlayer() }
        for prediction in predictions {
                let boundingBox = prediction.boundingBox
                let centroid = CGPoint(
                    x: boundingBox.origin.x + boundingBox.width / 2,
                    y: boundingBox.origin.y + boundingBox.height / 2
                )
                
            if let worldTransform = convertToWorldPosition(from: centroid, frame: self.arView.session.currentFrame!) {
                    let anchor = ARAnchor(name: "Detected Object", transform: worldTransform)
                    sceneView.session.add(anchor: anchor)
                    lastAnchors[anchor.identifier] = worldTransform
                }
            }
    }
    
    private func convertToWorldPosition(from point: CGPoint, frame: ARFrame) -> simd_float4x4? {
        let hitTestResults = frame.hitTest(point, types: [.featurePoint, .estimatedHorizontalPlane])
        return hitTestResults.first?.worldTransform
    }

With this logic, I'm unable to add the bounding boxes equal to the number of predictions received from model. And also some of the predictions which are drawing in the scene view aren't accurate in position with respect to their CGRect.

Can someone please suggest what could be possible wrong in implementation ?

Upvotes: 0

Views: 32

Answers (0)

Related Questions