MrMulliner
MrMulliner

Reputation: 149

What is the meaning of the fifth column in tesseract box files?

During Tesseract box file training, I found the need to write a script to shift some of the boxes. I opened a box file to determine which column corresponds to X/Y/W/H, and discovered a fifth column. The Tesseract wiki doesn't offer any explanations, and the example given in the "Make Box Files" section only contains zeros in the fifth column. My trained file contains other symbols. For example, these are some of the symbols I found: [":,}'4.*<&\;\|]. What do these mean?

Upvotes: 0

Views: 194

Answers (2)

nguyenq
nguyenq

Reputation: 8355

You probably meant the sixth or last column, which represents the page number (see Training wiki). It sounds like your box file was not correctly generated.

Upvotes: 1

sashoalm
sashoalm

Reputation: 79585

If I remember correctly, the fifth column is for a whitelist of characters. That way you can specify digits-only for one region, while another is for text.

Tesseract will recognize only symbols from the whitelist for a given region.

Upvotes: 1

Related Questions