How to use the converter from GCP Document AI

Question

I am trying to use the converter from document ai to converter some JSONs to Document AI JSON format. Using the function described in this documentation:

https://cloud.google.com/document-ai/docs/samples/documentai-toolbox-convert-external-annotations

I used the example config.json from the Github page:

https://github.com/googleapis/python-documentai-toolbox/blob/d29ff95742269a95e1e96e047f0fa1268457292a/samples/sample-converter-configs/Azure/invoice-config.json

And the JSON annotations from form recognizer and the PDF in the attachment (only replace "pxl" with "inch", because this is used in the JSON annotations).

To processor_id, I tried an Invoice Parser processor, one new, and another that has some models that I fine-tuned, also tried the processor version ID of some trained models and also the pretrained-invoice-v1.3-2022-07-15.

The input bucket have this format gs://convertion_input_test/azure_test/ and I put three files in the azure_test folder (sample-invoice.pdf, sample-invoice_annotations.json and sample-invoice_config.json). The output bucket is gs://convertion_output_test/azure_test/.

When I run the convert_external_annotations_sample() functions in all cases, I receive this output:

-------- Downloading Started --------
-------- Finished Downloading --------
-------- Converting Started --------
-------- Finished Converting --------
-------- Uploading Started --------
-------- Finished Uploading --------
-------- Schema Information --------
Unique Entity Types: []

And nothing is saved in the output bucket.

There are some configurations that I did wrong? I checked and the json annotation have all fields used in the config json, but I need to change something in this file?

The pdf file is like:

sample-invoice.pdf

The sample-invoice_config.json:

{
    "entity_object":"analyzeResult.documentResults.0.fields",
    "page": {
        "height":"analyzeResult.readResults.0.height",
        "width":"analyzeResult.readResults.0.width"
    },
    "entity": {
        "type_":"analyzeResult.documentResults.0.fields:self",
        "mention_text":"text",
        "normalized_vertices":{
            "type":"3",
            "unit":"inch",
            "base":"boundingBox",
            "x":"x",
            "y":"y"
        }
    }
}

And a part of the sample-invoice_annotations.json:

{
    "status": "succeeded",
    "createdDateTime": "2020-11-06T23:32:11Z",
    "lastUpdatedDateTime": "2020-11-06T23:32:20Z",
    "analyzeResult": {
        "version": "2.1.0",
        "readResults": [{
            "page": 1,
            "angle": 0,
            "width": 8.5,
            "height": 11,
            "unit": "inch"
        }],
        "pageResults": [{
            "page": 1,
            "tables": [{
                "rows": 3,
                "columns": 4,
                "cells": [{
                    "rowIndex": 0,
                    "columnIndex": 0,
                    "text": "QUANTITY",
                    "boundingBox": [0.4953,
                    5.7306,
                    1.8097,
                    5.7306,
                    1.7942,
                    6.0122,
                    0.4953,
                    6.0122]
                },
                {
                    "rowIndex": 0,
                    "columnIndex": 1,
                    "text": "DESCRIPTION",
                    "boundingBox": [1.8097,
                    5.7306,
                    5.7529,
                    5.7306,
                    5.7452,
                    6.0122,
                    1.7942,
                    6.0122]
                },
                {
                    "rowIndex": 0,
                    "columnIndex": 2,
                    "text": "UNIT PRICE",
                    "boundingBox": [5.7529,
                    5.7306,
                    6.8045,
                    5.7306,
                    6.8122,
                    6.0122,
                    5.7452,
                    6.0122]
                },

......
......
......

                "VendorName": {
                    "type": "string",
                    "valueString": "CONTOSO LTD.",
                    "text": "CONTOSO LTD.",
                    "boundingBox": [0.5909,
                    0.6827,
                    2.3215,
                    0.6827,
                    2.3215,
                    0.8644,
                    0.5909,
                    0.8644],
                    "page": 1,
                    "confidence": 0.998
                }
            }
        }]
    }
}

How to use the converter from GCP Document AI

Answers (1)

Related Questions