Reputation: 769
I'm trying to parse a handwritten document with Google Cloud Document AI. The document contains Cyrillic characters, however Document AI occasionally detects words with Latin characters. Is there way to specify the language of the document, so it will try to recognize the words in particular language regardless of the confidence?
Upvotes: 3
Views: 768
Reputation: 2234
There was a recent update to Document AI that supports the languageHints
parameter, which allows you to specify a language. Note: This only works when using the v1beta3
endpoint with the Document OCR processor at this time.
If the language is supported, then provide the BCP-47
code for the language in the processOptions
field when sending the processing request.
Upvotes: 2
Reputation: 1818
These are the languages supported in Document AI.
Currently it's not possible to specify the language to recognize the words in a particular language in the document. It can only detect language.
If you want the feature to specify the language of the document to be implemented, you can open a new feature request on the issue tracker describing your requirement.
Upvotes: 2