Bằng Rikimaru
Bằng Rikimaru

Reputation: 1595

Tesseract - What should I do with multi font type?

I've a file input like this (only numbers but multi font types). So If I want to use Tesseract for trainning data. I should make a set of one font type in one tiff file or multi font type in one tiff file?

enter image description here

What is better, please show me some tips. Thanks all your helps.

Upvotes: 0

Views: 836

Answers (1)

nguyenq
nguyenq

Reputation: 8355

One font style in each training image. Tesseract Training Wiki states the following:

  • The training data should be grouped by font. Ideally, all samples of a single font should go in a single tiff file, but this may be multi-page tiff (if you have libtiff or leptonica installed), so the total training data in a single font may be many pages and many 10s of thousands of characters, allowing training for large-character-set languages.

  • DO NOT MIX FONTS IN AN IMAGE FILE (In a single .tr file to be
    precise.) This will cause features to be dropped at clustering, which leads to recognition errors.

Upvotes: 1

Related Questions