Reputation: 1
I'm currently working on an iOS app for my senior project using Swift, and I’m integrating the Roboflow API to detect and extract grocery items from receipt images. The API correctly returns bounding box predictions, but I’m facing issues with extracting text from these bounding boxes.
The problem:
I send the receipt image (Base64 encoded) to the Roboflow API and receive the bounding box predictions in response. While the API provides bounding box coordinates (for items like Date, Item, Subtotal, etc.), no text is being extracted from those bounding boxes. I suspect the issue might be related to the way I handle the API response or the timing of the OCR extraction.
Here’s the relevant portion of my Swift code:
let request = NSMutableURLRequest(url: NSURL(string: "https://detect.roboflow.com/YOUR-MODEL/1?api_key=YOUR_API_KEY")! as URL)
request.httpMethod = "POST"
request.setValue("application/x-www-form-urlencoded", forHTTPHeaderField: "Content-Type")
let postData = NSMutableData(data: "base64=\(encodedImageString)".data(using: String.Encoding.utf8)!)
request.httpBody = postData as Data
let task = URLSession.shared.dataTask(with: request as URLRequest) { data, response, error in
guard error == nil else {
print("Error: \(String(describing: error))")
return
}
guard let data = data else {
print("No data received.")
return
}
do {
// Attempt to decode the response from Roboflow API
if let jsonResponse = try JSONSerialization.jsonObject(with: data, options: []) as? [String: Any],
let predictions = jsonResponse["predictions"] as? [[String: Any]] {
// Loop through the predictions and print bounding boxes
for prediction in predictions {
if let classType = prediction["class"] as? String,
let x = prediction["x"] as? Double,
let y = prediction["y"] as? Double,
let width = prediction["width"] as? Double,
let height = prediction["height"] as? Double {
print("Bounding box: Class: \(classType), x: \(x), y: \(y), width: \(width), height: \(height)")
// Attempt to extract text within the bounding boxes (This part is where the issue might be)
} else {
print("No bounding box found in prediction.")
}
}
} else {
print("Invalid response from API.")
}
} catch let error {
print("Failed to decode JSON: \(error)")
}
}
task.resume()
Question:
Is there a specific way I should be handling OCR extraction within bounding boxes?
Could there be a timing issue that is causing this? If so, how do I ensure the OCR process completes before proceeding?
Any guidance on what might be causing this issue or how to improve my approach would be greatly appreciated!
Additional Info:
I'm using Swift for iOS development. The Roboflow model I’m using has classes like Date, Item, Subtotal, etc.
What I’ve tried:
Upvotes: 0
Views: 46