Reputation: 11

Is there a way to predict document title from Google Cloud Vision OCR?

What I need help with is a way to predict document title from the OCR text which Google Cloud Vision extracts from a pdf/jpg file.

I have a jpg file which I am sending to Vision API and I get the OCR text. For the image attached, how could I programmatically predict that the title of the document is, "Piano Posture Checklist"?

Upvotes: 1

Answers (2)

wescpy

Reputation: 11167

You want to "predict document title." There are 2 possible scenarios here:

Either you want to predict the correct document title based on the title itself appearing somewhere in the document, or
You want to predict the title based on the (OCR'd) contents because the document didn't/doesn't come with a title.

For #1, I agree w/the response from Ricco: you should build a custom version of the Cloud Vision API just for your application, IOW tweaking the model using AutoML (well, AutoML Vision) to suit your needs, e.g., getting the title out of an OCR doc, whether it's looking for title placement/location, font size, etc.

More advanced is #2. You would probably have to use a pair of APIs... OCR with Cloud Vision (w/or w/o AutoML) then analyzing the text using NLU via Cloud Natural Language (or AutoML Natural Language if needed) to possibly autogenerate a title based on its contents if a document didn't come w/one. I believe in this case your training will likely have to lean towards supervised learning where you're providing titles paired w/untitled documents in your training data.

Upvotes: 0

Ricco D

Reputation: 7287

The response you get when detecting text using Vision API (TextAnnotation) is structured like TextAnnotation -> Page -> Block (text block, table block, etc.) -> Paragraph -> Word -> Symbol. Additional properties for these are the detected language, detected break (space, hyphen, line break) only. Thus Vision API is not capable to predict as specific as the "Title" of the document. See TextAnnotation reference.

If you want to predict as specific as "Title" in a document/image. I suggest to use AutoML Vision where you can train a model that will predict the "Title", given a set of documents/images that are properly labeled. Once training is done, you can send a prediction request to predict the "Title".

You can refer to this document for an example on how to prepare a dataset, train a model and predict.

Upvotes: 1

Is there a way to predict document title from Google Cloud Vision OCR?

Answers (2)

Related Questions