Reputation: 11
I'm unable to read the form exactly on using node-tesseract.Only the printed text of the form is recognized and returned correctly whereas the handwritten text is returned with some special characters.
My code is,
var options = {
l: 'deu',
psm: 6,
env: {
maxBuffer: 4096 * 4096
}
};
tesseract.process('./server/images/form.jpg', options, function (err,text) {
if (err) {
return console.log("An error occured: ", err);
}
console.log("Recognized text:");
console.log(text);
});
my input ------> OWNER Brian Dude
output------> OW_NER ägga ] )ggé;= ‘
here, OWNER is some text filed here
Upvotes: 1
Views: 1289
Reputation: 111
Tesseract Training for Handwritten Digit Recognition
Training Tesseract for Roman Font Handwriting
Check out the official Tesseract Training page.
The following link takes you through the Training Process, it helped me a lot. https://web.archive.org/web/20170820212334/http://www.resolveradiologic.com:80/blog/2013/01/15/training-tesseract
Use a third party GUI for Tesseract Training, it will make your life much easier. I recommend tesseract4java and jTessBoxEditor (both work on OS X)
Upvotes: 3