Reputation: 41
When creating the CLUSTERING data using
mftraining -F font_properties -U unicharset -O lan.unicharset *.tr
I get the following message
C:\Users\ \AppData\Local\Tesseract-OCR>mftraining -F font_properties -U unicharset -O eng1.unicharset eng.lucidaconsole.box.tr <http://eng.lucidaconsole.box.tr>
Warning: No shape table file present: shapetable
Failed to load unicharset from file unicharset
Building unicharset for training from scratch...
Failed to load unicharset from file unicharset
Building unicharset for boosting from scratch...
Failed to load unicharset from file unicharset
Building unicharset for boosting from scratch...
Failed to load unicharset from file unicharset
Building unicharset for boosting from scratch...
Reading eng.lucidaconsole.box.tr <http://eng.lucidaconsole.box.tr> ...
Flat shape table summary: Number of shapes = 0 max unichars = 0 number with multiple unichars = 0
Done!
It rebuilt the unicharset I had done already and gives me one with 1kb worth of data with only this in it
1
NULL 0 NULL 0
At this point I don't know what to do. I am a first time user to this program but to me this doesn't seem right?
Upvotes: 3
Views: 3049
Reputation: 1932
If you're using Windows,I think this tool can help you to make the training process much MUCH easier. I've been through a lot of troubles learning how to train Tesseract before using it. Just download the latest version and read the User manual, you will be able to train you Tesseract without touching the keyboard!
Upvotes: 0
Reputation: 18206
It looks like you need to cluster the the character features of the training pages, as described here.
I believe the basic command for this is something like:
shapeclustering -F font_properties -U unicharset lang.fontname.exp0.tr lang.fontname.exp1.tr ...
This appears to be something that was added in version 3.02.
Upvotes: 2