Doug Niccum
Doug Niccum

Reputation: 225

Google Document AI training fails to error for fields that don't exist

I am currently in the process of training a new document processor with Google's Document AI. I have 16 training documents and 10 testing documents, are easily within the minimums illustrated by Google. However when I attempt to train the processor, I continue to get errors for input types that don't exist or indicating that I don't have the right amount of annotated labels; even though I have verified that every single document that I have provided has been labeled appropriately that fall within the defined minimums.

As I have seen through Stack Overflow, the errors that people are reporting are very ambiguous, and I am seeing this as well. I have tried training the machine 4 different times with all of the same errors. Any help would be appreciated.

Incorrect input types

This is a sample of the error that I am getting for the error type. The invalid document error is citing an invalid num_field. However I don't have any num_fields in my schema.

"documentErrors": [
        {
          "code": 3,
          "message": "Invalid document.",
          "details": [
            {
              "@type": "type.googleapis.com/google.rpc.ErrorInfo",
              "reason": "INVALID_DOCUMENT",
              "domain": "documentai.googleapis.com",
              "metadata": {
                "annotation_name": "product_inventory_result/reorder_point",
                "field_name": "entities.text_anchor.text_segments",
                "num_fields": "0",
                "num_fields_needed": "1",
                "document": "3ef767351034410f.json"
              }
            }
          ]
        }
]

Invalid Dataset Errors

This error says that I only have 8 documents with annotations. Which is incorrect. I have verified that I have 16 training documents and 10 documents as I said before.

"datasetErrors": [
        {
          "code": 3,
          "message": "Invalid dataset.",
          "details": [
            {
              "@type": "type.googleapis.com/google.rpc.ErrorInfo",
              "reason": "INVALID_DATASET",
              "domain": "documentai.googleapis.com",
              "metadata": {
                "num_documents_with_annotation": "8",
                "num_documents_required": "10",
                "annotation_name": "DOCUMENTS_WITH_ENTITIES"
              }
            }
          ]
        }
]

Upvotes: 1

Views: 923

Answers (1)

Holt Skinner
Holt Skinner

Reputation: 2234

The issue seems that the dataset has several documents that have empty fields for product_inventory_result/reorder_point. (And possibly other fields) The entities.text_anchor.text_segments value is empty, meaning that a bounding box was labeled, but no text was found in the bounding box. This is the cause of the second error INVALID_DATASET because the dataset doesn't have enough valid documents.

Upvotes: 1

Related Questions