Reputation: 1
From some typified documents like receipts, invoices, relevant information is extracted with OCR and templates. Later, a person has to visually validate that the information was correctly identified or manually adjusting where needed. My task is to build a model that does the validation. I'm thinking convolutional and pooling layers, with the input being images, the coordinates of bounding boxed where the extracted text was found, the extracted text and correct text. The goal is to train the network to automatically make the corrections if needed, based on the correct labeled train material. The project is in the design phase. I'm interested in insights regarding the input data or the layers. Thank you in advance and have a nice day.
Upvotes: 0
Views: 53
Reputation: 52
Your intention of building a model to do validation after the OCR translates the typified documents seems a bit convoluted, since the OCR should be doing what you are hoping to accomplish with your model.
Also, it doesn't seem possible for the same model to be able to correct the OCR extracted text and validate other factors like correct bounding boxes, etc.
Perhaps you are looking to train individual models for each of these use cases.
I would advise you to simplify your objective for this new model to something like correcting wrongly interpreted text after the OCR converts the image to text.
Upvotes: 1