Reputation: 565
I compiled the example.
https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture
It did not work correctly for me on an iPhone 7 Plus. The rectangles drawn did not cover the items detected.
I created an app of my own to investigate. The detected objects are returned as normalized bounds. However the bounds can be negative in the Y direction. Adding a correction of 0.2 brings them back into alignment.
The detection appears to be cropping a square from the center of the portrait frame to do the detection. I created a square overlay and when the object moves out of the square either to the top or bottom the detection stops. Top and bottom of the square are 0 and 1.0 in the normalised coordinate.
The test App passes the data from captureOutput
to an VNImageRequestHandler
. The code that sets up the request is also below. Any idea why the observations are sometimes negative in the Y direction? Why do I need to add an offset to bring them back into the unit square and align them with the image?
I have set the camera to 4K in my test app. Not yet tried any other settings.
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
//let exifOrientation = exifOrientationFromDeviceOrientation()
let exifOrientation = CGImagePropertyOrientation.up
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientation, options: [:])
do {
try imageRequestHandler.perform(self.requests)
} catch {
print(error)
}
}
@discardableResult
func setupVision() -> NSError? {
// Setup Vision parts
let error: NSError! = nil
guard let modelURL = Bundle.main.url(forResource: "ResistorModel", withExtension: "mlmodelc") else {
return NSError(domain: "VisionObjectRecognitionViewController", code: -1, userInfo: [NSLocalizedDescriptionKey: "Model file is missing"])
}
do {
let visionModel = try VNCoreMLModel(for: MLModel(contentsOf: modelURL))
let objectRecognition = VNCoreMLRequest(model: visionModel, completionHandler: { (request, error) in
DispatchQueue.main.async(execute: {
// perform all the UI updates on the main queue
if let results = request.results {
self.drawVisionRequestResults(results)
}
})
})
self.requests = [objectRecognition]
} catch let error as NSError {
print("Model loading went wrong: \(error)")
}
return error
}
func drawVisionRequestResults(_ results: [Any]) {
var pipCreated = false
CATransaction.begin()
CATransaction.setValue(kCFBooleanTrue, forKey: kCATransactionDisableActions)
detectionOverlay.sublayers = nil // remove all the old recognized objects
for observation in results where observation is VNRecognizedObjectObservation {
guard let objectObservation = observation as? VNRecognizedObjectObservation else {
continue
}
// Select only the label with the highest confidence.
let topLabelObservation = objectObservation.labels[0]
if topLabelObservation.identifier == "resistor" {
if (objectObservation.boundingBox.minX < 0.5) && (objectObservation.boundingBox.maxX > 0.5) && (objectObservation.boundingBox.minY < 0.3) && (objectObservation.boundingBox.maxY > 0.3) {
//print(objectObservation.boundingBox.minX)
//print(objectObservation.boundingBox.minY)
let bb = CGRect(x: objectObservation.boundingBox.minX, y:0.8 - objectObservation.boundingBox.maxY, width: objectObservation.boundingBox.width, height: objectObservation.boundingBox.height)
//let bb = CGRect(x: 0.5,y: 0.5,width: 0.5,height: 0.5)
//let objectBounds = VNImageRectForNormalizedRect(bb, 500, 500)
let objectBounds = VNImageRectForNormalizedRect(bb, Int(detectionOverlay.bounds.width), Int(detectionOverlay.bounds.width))
// print(objectBounds)
// print(objectBounds.minX)
// print(objectBounds.minY)
// print(objectBounds.width)
// print(objectBounds.height)
print(objectObservation.boundingBox)
// print(objectBounds.minX)
// print(objectBounds.minY)
// print(objectBounds.width)
// print(objectBounds.height)
let textLayer = self.createTextSubLayerInBounds(objectBounds,
identifier: topLabelObservation.identifier,
confidence: topLabelObservation.confidence)
let shapeLayer = self.createRoundedRectLayerWithBounds(objectBounds)
shapeLayer.addSublayer(textLayer)
detectionOverlay.addSublayer(shapeLayer)
if !pipCreated {
pipCreated = true
let pip = Pip(imageBuffer: self.imageBuffer!)
if self.pip {
pipView.image = pip?.uiImage
} else {
pipView.image = nil
}
}
}
}
}
CATransaction.commit()
doingStuff = false
}
Upvotes: 0
Views: 1791
Reputation: 565
I'm not sure why it behaved as it did. However I would like it to have used the whole image to do the object detection and the results to be bound boxes normalised to the original portrait input. Note also the model was trained in this way.
There is a thread https://github.com/apple/turicreate/issues/1016 covering this exact issue. The example does not work and it does not work when you change the model.
The solution, towards the end of the post, says to use...
objectRecognition.imageCropAndScaleOption = .scaleFill
This made the detection use the whole image and produced bound boxes that were normalised to the whole image. No more arbitrary offset. It may be that the training geometry and the detection geometry has to be the same for it to calculate the bound box correctly. However I'm not sure why.
Upvotes: 0