Neil
Neil

Reputation: 6039

Detecting a map of key value pairs using Document AI

What I understood from DocumentAI docs is that the best match to extract information from a report like medical test result is to use the Form Parsing processor. This does a good job for reports where there is exactly one value for one label. Like patient name or patient age etc. But I was trying to get the table of various test results in a map of Key Value pair where key is the test name and value us the result.

With custom processor I tried to choose a label with property which can appear multiple times but that does not maintain the link between testName and testValue.

The Report looks like the follows enter image description here

Desired Result would probably be

{
  name : Jon Doe
  age : 76
    tests :[ 
    {
     testName : CRP , 
     testValue : 51
    },
    {
     testName : Creatinine , 
     testValue : 0.8
    }
]
}

I think it would be something similar to table. https://cloud.google.com/document-ai/docs/handle-response

Upvotes: 0

Views: 1117

Answers (1)

Holt Skinner
Holt Skinner

Reputation: 2234

The Form Parser Processor allows for Table Parsing when it can detect tables in the document. This sample code shows how the formFields and tables can be extracted.

https://cloud.google.com/document-ai/docs/handle-response#forms_and_tables

This Form Parser Codelab also shows a few more examples, like transforming the formFields & Tables into a Pandas DataFrame.

https://codelabs.developers.google.com/codelabs/docai-form-parser-v1-python

You can also create a Custom Document Extractor processor that makes a custom model for the specific document structure, but you will have to label example documents and train a new version.

Note, this creates an Entity Extraction processor, which works differently than the Form Parser (and doesn't currently extract form fields & tables in the same way).

You'll need to label each entity individually, train the processor, and use this sample code to get the entity information from the processing response.

https://cloud.google.com/document-ai/docs/handle-response#entities_nested_entities_and_normalized_values

Upvotes: 1

Related Questions