Reputation: 53
I'm trying to train Tesseract with image and i found this https://github.com/tesseract-ocr/tesstrain I've followed the instructions on how to train images but I keep on getting this error
Tesseract Open Source OCR Engine v5.0.0-alpha-635-g90405 with Leptonica
Page 1
Warning: Invalid resolution 0 dpi. Using 70 instead.
find data/foo-ground-truth -name '*.lstmf' | python3 shuffle.py 0 > "data/foo/all-lstmf"
Error: missing ground truth for training
Makefile:147: recipe for target 'data/foo/list.train' failed
make: *** [data/foo/list.train] Error 1
It keeps on showing this error Error: missing ground truth for training
command i used make training
the image and ground truth text are from the same repo ocrd-testset.zip
what could possibly the solution to fix this?
EDIT: Sorry forgot to mention that I only used 1 pair of training image from the ocrd-testset.zip
Upvotes: 1
Views: 2602
Reputation: 181
I followed the instructions in https://github.com/tesseract-ocr/tesstrain on Windows10.
It keeps on showing this error Error: missing ground truth for training at first. It because the code don't work on Windows:
(ALL_LSTMF): $(patsubst %.gt.txt,%.lstmf,$(shell find $(GROUND_TRUTH_DIR) -name '*.gt.txt'))
@mkdir -p $(OUTPUT_DIR)
find $(GROUND_TRUTH_DIR) -name '*.lstmf' | python3 shuffle.py $(RANDOM_SEED) > "$@"
and I change it to :
$(ALL_LSTMF): $(patsubst %.gt.txt,%.lstmf,$(wildcard $(GROUND_TRUTH_DIR)/*.gt.txt))
@mkdir -p $(OUTPUT_DIR)
find $(GROUND_TRUTH_DIR) -name '*.lstmf' -exec echo {} \; | sort -R -o "$@"
Then the error disappeared.
The changed code comes from the old version of tesseract-ocr/tesstrain. It should work on both linux and windows.Maybe you could try it.
Upvotes: 0