user10907559
user10907559

Reputation:

Tesseract can't recognize images with handwritten text, what can I do?

As I asked in my previous question the problem I'm facing is that I have hundreds of images of handwritten notes. They were written from different people but they are in sequence so you know that for example person1 wrote img1.jpg -> img100.jpg. The style of handwriting varies a lot from person to person but there are parts of the notes which are always fixed (maybe that can help an algorithm).

I followed one user suggestion to use tesseract but it couldn't recognize any of the text. The text is not in engligh but I did use the appropriate language data file.

My knowledge of ai is limited but from searching and looking at some papers it looks like this could be done with a CNN. Can someone guide as to what I should do from here? I'd like to go forward with the project but I also don't have a lot of time to learn about neural networks. How challenging is it to implement one that solves this task?

Upvotes: 1

Views: 4549

Answers (1)

timguy
timguy

Reputation: 2582

I wouldn't use tesseract for handwriting recognition. You can train tesseract for handwriting recognition but out of the box it works well for printet text and a lot of fonts and languages.

Here are two links how to train it yourself:

I had best results with Azure Recognition and good with Amazon Recognition: https://aws.amazon.com/en/recognition I would like to have a offline java library for it but didn't found any yet. My next step will be to try ABBYY services because they can also focus on seperated handwritten characters: https://abbyy.technology/en:features:ocr:icr

Update

If somebody find a library or good service even years later I would be happy to see them in the comments.

Upvotes: 3

Related Questions