Reputation: 6968
I have been developing an application for android which uses tesseract OCR (optical character recognition) and was wondering if there is a method for improving the results for small text.
I have tried re-compiling the standard dictionary with my own frequent and normal word lists (using wordlist2dawg) and have seen no improvement (I can't even tell if it is helping!). I have also heard it is possible to alter the threshold at which tesseract uses dictionary words but I have no idea how to do this.
If anybody has an idea of how I could improve the results tesseract gives me I would really appreciate it!
Upvotes: 1
Views: 3726
Reputation: 2214
I know of some options that might help you:
Keep in mind, that built-in camera in mobile devices mostly produce low quality images (blured, noised, skewed etc.) OCR itself is a resource comsuming process and if you add a worthy image preprocessing to that, low-end and mid mobile devices (which are likely to have android) could face unexpectedly slow performance or even lack of resources. That's OK for free/study projects, but if you're planning a commercial app - consider using a better SDK.
Have a look at this question for details: OCR for android
Upvotes: 3