Xavier Silva
Xavier Silva

Reputation: 267

Vision API - Force API to analyze a image not perceived as a single text line

I've been around Google Vision API but I have a problem I can't really solve. This is the image I'm dealing with:

enter image description here

In the image above, Google Vision API (also happens with IBM (Watson) and Microsft (Cognitive Services)) does not understand that 2,99€ is something to read because it is not treated as a single line, so the output is all but what I expect him to do (understand the price of the label).

If I was using Tesseract, I would solve this by using the -psm 7 option in order to force it to read it as a single text line, but I can't really find documentation for this situation using Google Vision API.

Has anyone done something similar before? I cannot figure out how to solve this problem...

Upvotes: 1

Views: 224

Answers (1)

Tino A.
Tino A.

Reputation: 232

I have a similiar problem and it appears that the Vision API might not be the right fit for this kind of problem. The API does not give you any information about the structure of the found text (other than the rectangulkar where the text is found) and in turn also does not care about the structure.

AFAIK you cant solve this problem with the vision API yet, although there might be some sort of solution in the future.

Right now there ist the "ImageContext" part of the AnnotateImageRequest which I hope will be used for exactly what you are trying to do in the future.

Upvotes: 1

Related Questions