Reputation: 225
I am currently in the process of training a new document processor with Google's Document AI. I have 16 training documents and 10 testing documents, are easily within the minimums illustrated by Google. However when I attempt to train the processor, I continue to get errors for input types that don't exist or indicating that I don't have the right amount of annotated labels; even though I have verified that every single document that I have provided has been labeled appropriately that fall within the defined minimums.
As I have seen through Stack Overflow, the errors that people are reporting are very ambiguous, and I am seeing this as well. I have tried training the machine 4 different times with all of the same errors. Any help would be appreciated.
This is a sample of the error that I am getting for the error type. The invalid document error is citing an invalid num_field. However I don't have any num_fields in my schema.
"documentErrors": [
{
"code": 3,
"message": "Invalid document.",
"details": [
{
"@type": "type.googleapis.com/google.rpc.ErrorInfo",
"reason": "INVALID_DOCUMENT",
"domain": "documentai.googleapis.com",
"metadata": {
"annotation_name": "product_inventory_result/reorder_point",
"field_name": "entities.text_anchor.text_segments",
"num_fields": "0",
"num_fields_needed": "1",
"document": "3ef767351034410f.json"
}
}
]
}
]
This error says that I only have 8 documents with annotations. Which is incorrect. I have verified that I have 16 training documents and 10 documents as I said before.
"datasetErrors": [
{
"code": 3,
"message": "Invalid dataset.",
"details": [
{
"@type": "type.googleapis.com/google.rpc.ErrorInfo",
"reason": "INVALID_DATASET",
"domain": "documentai.googleapis.com",
"metadata": {
"num_documents_with_annotation": "8",
"num_documents_required": "10",
"annotation_name": "DOCUMENTS_WITH_ENTITIES"
}
}
]
}
]
Upvotes: 1
Views: 923
Reputation: 2234
The issue seems that the dataset has several documents that have empty fields for product_inventory_result/reorder_point
. (And possibly other fields) The entities.text_anchor.text_segments
value is empty, meaning that a bounding box was labeled, but no text was found in the bounding box. This is the cause of the second error INVALID_DATASET
because the dataset doesn't have enough valid documents.
Upvotes: 1