greenlasagna
greenlasagna

Reputation: 91

How to generate a tiff/box file from an image to train Tesseract in Windows

I'm trying to train Tesseract in Windows and for that I need a pair tiff/box file and I'm trying to create it using jTessBoxEditor but it doesn't accept images as input. I've also tried boxFactory but it doesn't run properly. Does anyone know what is the best tool to create the pair from images?

Thanks

Upvotes: 8

Views: 12053

Answers (2)

Michael Ohlrogge
Michael Ohlrogge

Reputation: 10990

I had this same kind of problem with being unable to properly open images with jTessBoxEditor in order to work with their boxes. I realized that one essential component is that the name of the .tif image and the name of the .box file must be identical, except for the different extensions. Without this, jTessBoxEditor won't be able to know which box file goes with which image. Thus, using the syntax suggested by darkpotpot above, then making sure the two file names match like indicated, then clicking on the "open" button in the Box Editor tab of jTessBoxEditor should work.

Upvotes: 1

darkpotpot
darkpotpot

Reputation: 1381

If you have jTessBoxEditor, then you have Tesseract bin files. Go to the tesseract-ocr subfolder of jTessBoxEditor and run the following command :

tesseract.exe D:\testocr\TestImage.tif D:\testocr\TestImage batch.nochop makebox

It should generate the file D:\testocr\TestImage.box. Then in jTessBoxEditor, go to Box Editor tab and open your image. The box file is automatically loaded, you can check if everything is ok and correct possible mistakes.

Upvotes: 10

Related Questions