Reputation: 49
I've created and trained a custom document extractor via GCP's Document AI and have noticed that it doesn't always notice the space between two sets of numbers and ends up putting them together.
An example is shown below where the document shows 8 95
but the tool interprets it as 895
which is not accurate in this case.
I recognize that this may just be due to the document itself and that perhaps the spacing is not super obvious even to the human eye.
Also, as a side note, when training the extractor and utilizing the bounding box tool, I noticed that it would often ignore the space as well but I would personally go ahead and adjust it accordingly to include the space. I was hoping that would help the model but that doesn't seem to be the case.
Ultimately, it may come down to needing more training/testing documents to be added and annotated but was hoping for someone to provide some other insight if possible!
Upvotes: 0
Views: 56