Reputation: 320
I'm playing with Google Document AI and when I read some documentation from Google and other sources I often see a statement that Document AI can classify documents, not only extract the data by labels. However, I don't see how I can achieve that.
Does anybody have any ideas on how to do that?
Upvotes: 1
Views: 1515
Reputation: 2234
Update on the product: Document AI now supports creating Custom Document Classifier processors in GA which allows classification of custom document types. So you won't need to use AutoML Image or Text Classification for classifying documents that don't have a dedicated Specialized Splitter/Classifier.
Here's the instructions for how to create one.
https://cloud.google.com/document-ai/docs/workbench/build-custom-classification-processor
Upvotes: 2
Reputation: 551
You can perform documents classification when using what is called Specialized Processors.
There is this codelab which explains how to deal with those specialized processors (including document classification).
Another way of creating documents classification is using Vertex AI AutoML image classification where you can create a dataset of documents images (ie. scanned documents) and train a model that will get a new document image and predict if it is document type 1, type 2, type 3, etc.
Upvotes: 2