PPrasai
PPrasai

Reputation: 1186

How to group blocks that are part of a bigger sentences in Google Cloud Vision API?

I am using Google Cloud Vision API on Python to detect text values in hoarding boards that are usually found above a shop/store. So far I have been able to detect individual words and their bounding polygons' coordinates. Is there a way to group the detected words based on their relative positions and sizes?

For example, the name of the store is usually written in same size and the words are aligned. Does the API provide some functions that group those words which probably are parts of a bigger sentence (the store name, or the address, etc.)?

If the API does not provide such functions, what would be a good approach to group them? Following is an example of an image what I have done so far:

shop's banner Vision API output excerpt:

description: "SHOP"
bounding_poly {
  vertices {
    x: 4713
    y: 737
  }
  vertices {
    x: 5538
    y: 737
  }
  vertices {
    x: 5538
    y: 1086
  }
  vertices {
    x: 4713
    y: 1086
  }
}
, description: "OVOns"
bounding_poly {
  vertices {
    x: 6662
    y: 1385
  }
  vertices {
    x: 6745
    y: 1385
  }
  vertices {
    x: 6745
    y: 1402
  }
  vertices {
    x: 6662
    y: 1402
  }
}

Upvotes: 0

Views: 2365

Answers (1)

Armin_SC
Armin_SC

Reputation: 2270

I suggest you to take a look on the TextAnnotation response format that is applied when using the DOCUMENT_TEXT_DETECTION for OCR recognition request. This responses contains detailed information about the image metadata and text content values that can be used to group the text by block, paragraph, word, etc, as described in the public documentation:

TextAnnotation contains a structured representation of OCR extracted text. The hierarchy of an OCR extracted text structure is like this: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol

Additionally, you can follow this useful example where is shown how you can organize the text extracted from a receipt image by processing the fullTextAnnotation response content.

Upvotes: 1

Related Questions