Reputation: 21

Building a bounding box surrounding text in Google Vision API to extract the text

I am currently trying out Google Vision API and extracting text from an image of a form. Google Vision API extract everything on the form despite me setting up ROI on specific text location which I want. Is there a way to extract out the text that I want at specific location instead of the whole image?

Upvotes: 2

Answers (2)

Siddharth Raj

Reputation: 131

def get_text_within(document, x1, y1, x2, y2):
text = ""
for page in document.pages:
    for block in page.blocks:
        for paragraph in block.paragraphs:
            for word in paragraph.words:
                for symbol in word.symbols:
                    min_x = min(symbol.bounding_box.vertices[0].x, symbol.bounding_box.vertices[1].x,
                                symbol.bounding_box.vertices[2].x, symbol.bounding_box.vertices[3].x)
                    max_x = max(symbol.bounding_box.vertices[0].x, symbol.bounding_box.vertices[1].x,
                                symbol.bounding_box.vertices[2].x, symbol.bounding_box.vertices[3].x)
                    min_y = min(symbol.bounding_box.vertices[0].y, symbol.bounding_box.vertices[1].y,
                                symbol.bounding_box.vertices[2].y, symbol.bounding_box.vertices[3].y)
                    max_y = max(symbol.bounding_box.vertices[0].y, symbol.bounding_box.vertices[1].y,
                                symbol.bounding_box.vertices[2].y, symbol.bounding_box.vertices[3].y)
                    if (min_x >= x1 and max_x <= x2 and min_y >= y1 and max_y <= y2):
                        text += symbol.text
                    if (symbol.property.detected_break.type == 1 or
                            symbol.property.detected_break.type == 3):
                        text += ' '
                    if (symbol.property.detected_break.type == 2):
                        text += '\t'
                    if (symbol.property.detected_break.type == 5):
                        text += '\n'
return text

Upvotes: 0

Patricio

Reputation: 60

There is no way to extract text only from a specific location of an image using the Google Vision API, it always extracts the text from the whole image. However, if you want to extract the text from a specific location, you could try cropping the image before passing it to the API. Another option would be filtering out the results from the API call using the position of the four bounding vertices associated with each piece of text.

You can find more info on what is possible to do with the Google Vision API here.

Upvotes: 0

Building a bounding box surrounding text in Google Vision API to extract the text

Answers (2)

Related Questions