Reputation: 267
I've been around Google Vision API but I have a problem I can't really solve. This is the image I'm dealing with:
In the image above, Google Vision API (also happens with IBM (Watson) and Microsft (Cognitive Services)) does not understand that 2,99€ is something to read because it is not treated as a single line, so the output is all but what I expect him to do (understand the price of the label).
If I was using Tesseract, I would solve this by using the -psm 7
option in order to force it to read it as a single text line, but I can't really find documentation for this situation using Google Vision API.
Has anyone done something similar before? I cannot figure out how to solve this problem...
Upvotes: 1
Views: 224
Reputation: 232
I have a similiar problem and it appears that the Vision API might not be the right fit for this kind of problem. The API does not give you any information about the structure of the found text (other than the rectangulkar where the text is found) and in turn also does not care about the structure.
AFAIK you cant solve this problem with the vision API yet, although there might be some sort of solution in the future.
Right now there ist the "ImageContext" part of the AnnotateImageRequest which I hope will be used for exactly what you are trying to do in the future.
Upvotes: 1