Reputation: 11399
I am using Document AI with a Custom Extractor. When I create a new Custom Extractor, it offers to manage my dataset.
I expect that doing so will automatically create label names for the documents I upload for this task.
Also, it offers "Auto-label". I hope that this will even automatically generate the label names for me, guaranteeing some kind of consistency between different Custom Extractors.
I checked the "hint" button shown next to it, and it confirmed my thoughts:
When I check auto-labeling, I am asked to select a 'Version'. The only 'version' that I am able to select in this case is 'pretrained-foundation-model-v1.0-2023-08-22'."
I do this because I expect the foundation model to be capable of assigning label names to my documents automatically.
My documents upload fine, but then I am shown this message:
{
"name": "projects/xxxxxxxxx/locations/xxxxxxx/operations/xxxxxxx",
"done": true,
"result": "error",
"response": {},
"metadata": {
"@type": "type.googleapis.com/google.cloud.documentai.uiv1beta3.ImportDocumentsMetadata",
"commonMetadata": {
"state": "FAILED",
"createTime": "202x-xxx-xxT01:xx:45.367220Z",
"updateTime": "202x-xxx-xxT01:xx:57.243001Z",
"resource": "projects/xxxxxxx/locations/xxxxxx/processors/xxxxxxxxx/dataset"
},
"totalDocumentCount": 142
},
"error": {
"code": 3,
"message": "No valid schema provided for processing.",
"details": []
}
}
What do I have to do there?
Upvotes: 0
Views: 400
Reputation: 11
Only started using Document AI relatively recently myself, but from my understanding it's because you still need to create the schema of labels. I.e. the schema let's the model know what to look for. Otherwise it'd probably label everything (which might not be desired behaviour).
The pre-trained model is able to auto-label a document - so generally if you name it something that makes sense, it'll pick what it thinks is the related field. In the example, the schema labels would be supplier_name
, receiver_name
and ship_to_address
.
If you click into one of your imported documents, it should have a create new field on the left. You will need to set the label name and how many times you think the field will occur in the document (single vs multiple occurrence). Once you set this, it should pick up the field - though you can also adjust what it picks up via the UI.
You can also add the fields directly to the schema (in manage dataset), but I like to look at a few of the imported files and add the labels (to be collected) there. Hope that helps
Upvotes: 1