William J Bagshaw
William J Bagshaw

Reputation: 565

Why are there negative coordinate in the normalised object detection results? (CoreML,Vision,Swift, Ios)

I compiled the example.

https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture

It did not work correctly for me on an iPhone 7 Plus. The rectangles drawn did not cover the items detected.

I created an app of my own to investigate. The detected objects are returned as normalized bounds. However the bounds can be negative in the Y direction. Adding a correction of 0.2 brings them back into alignment.

The detection appears to be cropping a square from the center of the portrait frame to do the detection. I created a square overlay and when the object moves out of the square either to the top or bottom the detection stops. Top and bottom of the square are 0 and 1.0 in the normalised coordinate.

The test App passes the data from captureOutput to an VNImageRequestHandler. The code that sets up the request is also below. Any idea why the observations are sometimes negative in the Y direction? Why do I need to add an offset to bring them back into the unit square and align them with the image?

I have set the camera to 4K in my test app. Not yet tried any other settings.

    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
            return
        }

        //let exifOrientation = exifOrientationFromDeviceOrientation()
        let exifOrientation = CGImagePropertyOrientation.up
        let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientation, options: [:])
        do {
            try imageRequestHandler.perform(self.requests)
        } catch {
            print(error)
        }
    }
@discardableResult
func setupVision() -> NSError? {
    // Setup Vision parts
    let error: NSError! = nil

    guard let modelURL = Bundle.main.url(forResource: "ResistorModel", withExtension: "mlmodelc") else {
        return NSError(domain: "VisionObjectRecognitionViewController", code: -1, userInfo: [NSLocalizedDescriptionKey: "Model file is missing"])
    }
    do {
        let visionModel = try VNCoreMLModel(for: MLModel(contentsOf: modelURL))
        let objectRecognition = VNCoreMLRequest(model: visionModel, completionHandler: { (request, error) in
            DispatchQueue.main.async(execute: {
                // perform all the UI updates on the main queue
                if let results = request.results {
                    self.drawVisionRequestResults(results)
                }
            })
        })
        self.requests = [objectRecognition]
    } catch let error as NSError {
        print("Model loading went wrong: \(error)")
    }

    return error
}


    func drawVisionRequestResults(_ results: [Any]) {
        var pipCreated = false
        CATransaction.begin()
        CATransaction.setValue(kCFBooleanTrue, forKey: kCATransactionDisableActions)
        detectionOverlay.sublayers = nil // remove all the old recognized objects
        for observation in results where observation is VNRecognizedObjectObservation {
            guard let objectObservation = observation as? VNRecognizedObjectObservation else {
                continue
            }
            // Select only the label with the highest confidence.
            let topLabelObservation = objectObservation.labels[0]
            if topLabelObservation.identifier == "resistor" {
                if (objectObservation.boundingBox.minX < 0.5) && (objectObservation.boundingBox.maxX > 0.5) && (objectObservation.boundingBox.minY < 0.3) && (objectObservation.boundingBox.maxY > 0.3) {
                    //print(objectObservation.boundingBox.minX)
                    //print(objectObservation.boundingBox.minY)

                    let bb = CGRect(x: objectObservation.boundingBox.minX, y:0.8 -  objectObservation.boundingBox.maxY, width: objectObservation.boundingBox.width, height: objectObservation.boundingBox.height)
                    //let bb = CGRect(x: 0.5,y: 0.5,width: 0.5,height: 0.5)
                        //let objectBounds = VNImageRectForNormalizedRect(bb, 500, 500)
                    let objectBounds = VNImageRectForNormalizedRect(bb, Int(detectionOverlay.bounds.width), Int(detectionOverlay.bounds.width))

//                    print(objectBounds)
//                    print(objectBounds.minX)
//                    print(objectBounds.minY)
//                    print(objectBounds.width)
//                    print(objectBounds.height)

                    print(objectObservation.boundingBox)
//                    print(objectBounds.minX)
//                    print(objectBounds.minY)
//                    print(objectBounds.width)
//                    print(objectBounds.height)

                    let textLayer = self.createTextSubLayerInBounds(objectBounds,
                                                                    identifier: topLabelObservation.identifier,
                                                                    confidence: topLabelObservation.confidence)

                    let shapeLayer = self.createRoundedRectLayerWithBounds(objectBounds)

                    shapeLayer.addSublayer(textLayer)
                    detectionOverlay.addSublayer(shapeLayer)

                    if !pipCreated {
                        pipCreated = true
                        let pip = Pip(imageBuffer: self.imageBuffer!)
                        if self.pip {
                            pipView.image = pip?.uiImage
                        } else {
                            pipView.image = nil
                        }
                    }
                }
            }
        }
        CATransaction.commit()
        doingStuff = false
    }

Upvotes: 0

Views: 1791

Answers (1)

William J Bagshaw
William J Bagshaw

Reputation: 565

I'm not sure why it behaved as it did. However I would like it to have used the whole image to do the object detection and the results to be bound boxes normalised to the original portrait input. Note also the model was trained in this way.

There is a thread https://github.com/apple/turicreate/issues/1016 covering this exact issue. The example does not work and it does not work when you change the model.

The solution, towards the end of the post, says to use...

objectRecognition.imageCropAndScaleOption = .scaleFill

This made the detection use the whole image and produced bound boxes that were normalised to the whole image. No more arbitrary offset. It may be that the training geometry and the detection geometry has to be the same for it to calculate the bound box correctly. However I'm not sure why.

Upvotes: 0

Related Questions