Reputation: 6039
What I understood from DocumentAI docs is that the best match to extract information from a report like medical test result is to use the Form Parsing processor. This does a good job for reports where there is exactly one value for one label. Like patient name or patient age etc. But I was trying to get the table of various test results in a map of Key Value pair where key is the test name and value us the result.
With custom processor I tried to choose a label with property which can appear multiple times but that does not maintain the link between testName and testValue.
The Report looks like the follows
Desired Result would probably be
{
name : Jon Doe
age : 76
tests :[
{
testName : CRP ,
testValue : 51
},
{
testName : Creatinine ,
testValue : 0.8
}
]
}
I think it would be something similar to table. https://cloud.google.com/document-ai/docs/handle-response
Upvotes: 0
Views: 1117
Reputation: 2234
The Form Parser Processor allows for Table Parsing when it can detect tables in the document. This sample code shows how the formField
s and tables can be extracted.
https://cloud.google.com/document-ai/docs/handle-response#forms_and_tables
This Form Parser Codelab also shows a few more examples, like transforming the formFields & Tables into a Pandas DataFrame.
https://codelabs.developers.google.com/codelabs/docai-form-parser-v1-python
You can also create a Custom Document Extractor processor that makes a custom model for the specific document structure, but you will have to label example documents and train a new version.
Note, this creates an Entity Extraction processor, which works differently than the Form Parser (and doesn't currently extract form fields & tables in the same way).
You'll need to label each entity individually, train the processor, and use this sample code to get the entity information from the processing response.
Upvotes: 1