Glory to Russia
Glory to Russia

Reputation: 18712

Why does Tesseract fail with "Empty page" with this image?

I have the following screenshot:

Original screenshot

I want to extract the manuscript word count, 3.574 in this case, from that image (see red rectangle below).

Screenshot with the text that I want to OCR marked

To do this, I run following script:

magick screenshot.png -crop 33x20+2+83 screenshot-cropped.png
tesseract screenshot-cropped.png screenshot-ocred -l eng

The first line cuts out the place with the word count and saves it in screenshot-cropped.png which looks like this:

The text to recognize as image

tesseract screenshot-cropped.png screenshot-ocred -l eng is supposed to recognize the characters and save them as text in screenshot-ocred.txt.

However, it produces the following error:

C:\usr\dp\ref\marcomm\2020_04_22_wordCounter>ocr.bat

C:\usr\dp\ref\marcomm\2020_04_22_wordCounter>magick screenshot.png -crop 33x20+2+83 screenshot-cropped.png

C:\usr\dp\ref\marcomm\2020_04_22_wordCounter>tesseract screenshot-cropped.png screenshot-ocred -l eng
Tesseract Open Source OCR Engine v5.0.0-alpha.20200328 with Leptonica
Empty page!!
Empty page!!

How can I fix it, i. e. make Tesseract recognize 3.574 and save it in screenshot-ocred.txt?

Note: All of this runs on Windows. Here is the output of magick --version:

C:\usr\dp\ref\marcomm\2020_04_22_wordCounter>magick --version
Version: ImageMagick 7.0.10-7 Q16 x64 2020-04-20 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2018 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Visual C++: 180040629
Features: Cipher DPC Modules OpenCL OpenMP(2.0)
Delegates (built-in): bzlib cairo flif freetype gslib heic jng jp2 jpeg lcms lqr lzma openexr pangocairo png ps raw rsvg tiff webp xml zlib

Upvotes: 0

Views: 1284

Answers (1)

Glory to Russia
Glory to Russia

Reputation: 18712

Adding --psm 7 to the Tesseract call solved the problem (tesseract screenshot-cropped.png screenshot-ocred -l eng --psm 7).

Upvotes: 1

Related Questions