Reputation: 1
I've been looking around for a while but have been unable to find someone describing exactly what I'm looking to accomplish.
Currently, I have about 25,000 images of old typewritten documents that I am looking to build a database from. Originally, I figured I would just be able to run these images through OCR software in one batch, and work with the data from there. However, the format of the images makes it impossible for me to batch convert them using OCR software without losing much of the data in the documents. The orientation and placement of the relevant information in each photo is different, and this prevents me from being able to use a single template to tell the OCR software what information should be read in. Each photo consists of a sheet of paper with a table of information on it, and you can see some of the background around the edge of the sheet of paper.
What I'm interested in doing is finding a way to automatically re-orient and crop each image so that the table of information in each has the same position and orientation. This way, I would be able to batch convert all images into the actual data using OCR software. If there is no way to do this automatically, it would require many hours to manually convert each document.
I think that there might be a way of doing this using computer vision techniques, but I don't really know how feasible this is. These slides describe something similar to what I want to do, but not exactly. I would appreciate any advice on how I could go about accomplishing this.
Upvotes: 0
Views: 84
Reputation: 146
I really don't know if writing an automatic software is the way to go, trust me, it would take you far lesser time in manually arranging all documents, than to write the code for it. As far as I can see, some sort of automatic boxing technique could be used, using PCA or something on similar lines. However, if you are not a computer vision student or a prospective learner of the field, I highly recommend the manual method.
Sorry for the discouraging answer, but sometimes you have to take the sour medicine. :-(
Upvotes: 1