Abhishek Dasgupta
Abhishek Dasgupta

Reputation: 23

Does Cloud Vision API have a way to get back Key Value pair as response, like their AWS Textract counterpart?

I need a way to access the OCR data in a Key-Value format. Does Google Cloud Vision API have a way to get back Key Value pair as response, like their AWS Textract counterpart?

We are currently getting back boundary coordinates, but that doesn't really help in the scenario we are working with.

Is there any OOB or easy configuration setting we might be overlooking?

Upvotes: 1

Views: 388

Answers (1)

Ricco D
Ricco D

Reputation: 7287

I looked up AWS Textract and GCP has a similar product which is Document AI. Document AI can process simple documents and it is also capable to process specific type of forms like government forms, invoices, etc.

I'm not familiar on how AWS Textract retrieves data, but the response in Document AI is structured like Document -> Pages -> (Paragraphs/Lines/Block) -> Layout -> Text Anchor -> Text Segment. With this structure, the Text Segment contains startIndex and endIndex. Using these values, you can fetch the whole Paragraph/Line/Block from Document.Text and will return the actual value.

You can check a sample code implementation here so you can see the flow on how Document AI.

Upvotes: 1

Related Questions