Reputation: 33
I am fairly new to the Google Cloud Vision API so my apologies if there is an obvious answer to this. I am noticing that for some images I am getting different OCR results between the Google Cloud Vision API Drag and Drop (https://cloud.google.com/vision/docs/drag-and-drop) and from local image detection in python.
My code is as follows
import io
# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types
# Instantiates a client
client = vision.ImageAnnotatorClient()
# The name of the image file to annotate
file_name = "./test0004a.jpg"
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
# print('\n"{}"'.format(text.description.encode('utf-8')))
print('\n"{}"'.format(text.description.encode('ascii','ignore')))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
A sample image that highlights this is attached Sample Image
The python code above doesn't return anything, but in the browser using drag and drop it correctly identifies "2340" as the text. Shouldn't both python and the browser return the same result?. And if not, why not?, Do I need to include additional parameters in the code?.
Upvotes: 3
Views: 1765
Reputation: 8178
The issue here is that you are using TEXT_DETECTION
instead of DOCUMENT_TEXT_DETECTION
, which is the feature being used in the Drag and Drop example page that you shared.
By changing the method (to document_text_detection()
), you should obtain the desired results (I have tested it with your code, and it did work):
# Using TEXT_DETECTION
response = client.text_detection(image=image)
# Using DOCUMENT_TEXT_DETECTION
response = client.document_text_detection(image=image)
Although both methods can be used for OCR, as presented in the documentation, DOCUMENT_TEXT_DETECTION
is optimized for dense text and documents. The image you shared is not a really high-quality one, and the text is not clear, therefore it may be that for this type of images, DOCUMENT_TEXT_DETECTION
offers a better performance than TEXT_DETECTION
.
See some other examples where DOCUMENT_TEXT_DETECTION
worked better than TEXT_DETECTION
. In any case, please note that it might not always be the situation, and TEXT_DETECTION
may still have better results under certain conditions:
Upvotes: 5