Reputation: 1595
I've a file input like this (only numbers but multi font types). So If I want to use Tesseract for trainning data. I should make a set of one font type in one tiff file or multi font type in one tiff file?
What is better, please show me some tips. Thanks all your helps.
Upvotes: 0
Views: 836
Reputation: 8355
One font style in each training image. Tesseract Training Wiki states the following:
The training data should be grouped by font. Ideally, all samples of a single font should go in a single tiff file, but this may be multi-page tiff (if you have libtiff or leptonica installed), so the total training data in a single font may be many pages and many 10s of thousands of characters, allowing training for large-character-set languages.
DO NOT MIX FONTS IN AN IMAGE FILE (In a single .tr file to be
precise.) This will cause features to be dropped at clustering, which
leads to recognition errors.
Upvotes: 1