Damon Schulz
Tesseract Can't Find Any Languages

UPDATE *I have reinstalled tesseract into my 'program files (x86)' folder and now when I run tesseract --version it responds with the version rather than saying it isn't recognized as a cmdlet *

This seems to be a pretty common problem and have been trying different ways to make this program work. I know there are a lot of existing questions similar to mine but since none of the methods I have found work, I am hoping to get some fresh ideas. TIA


"pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:\Program Files (x86)\Tesseract-OCR\tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.')"


from pdf2image import convert_from_path
import pytesseract

images = convert_from_path("CHECK_12-01-22.pdf", 500, poppler_path=r'C:\Program Files\poppler-23.01.0\Library\bin')
for i, image in enumerate(images):
    fname = 'image' + str(i) + '.png'
    image.save(fname, "PNG")

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

text = pytesseract.image_to_string(image, lang='eng')
# text = pytesseract.image_to_string(image, lang='eng', config='--tessdata-dir "C:\\Program Files\\Tesseract-OCR\\tessdata"')

I am using Windows 11 and PyCharm.

I have Poppler working, that converts my PDF to Images but when I try to run Tesseract, it says there aren't any languages found. I have tried a few different methods to get it working. First my Environment Variables are set. image of environment variable path

Then I tried using config in my code.

text = pytesseract.image_to_string(image, lang='eng', config='--tessdata-dir "C:\\Program Files\\Tesseract-OCR\\tessdata"')

which also didn't work. I've downloaded different language data files and put them in the tessdata folder to no avail.

Damon Schulz
Here is the solution I was able to find

tessdata_dir_config = "--tessdata-dir 'C:\\Program Files (x86)\\Tesseract-OCR\\tessdata\\"

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

text = pytesseract.image_to_string(image, lang='eng', config=tessdata_dir_config)`

I initially was using an x32 bit version while I have a 64 bit operating system so I uninstalled Tessract-OCR and found the x86 bit version and reinstalled that to my program files (x86) folder. I had to point to the tessdata before calling the tesseract_cmd. I made the path into a variable that I was able to use as an argument while converting the image to text.

Have you set the system environment variable right? Check with the command:


In my environment the system variable is under: enter image description here you should see in this directory the eng files: enter image description here

When starting a tesseract application the tessdata folder needs to be correctly found by tesseract.exe

There are many ways to do that so in a batch file I may use for a specific case such as MuPDF the first command line in a batch as

set TESSDATA_PREFIX=C:\Apps\PDF\mupdf\mupdf-1.21.0-windows-tesseract\mupdf-1.21.0-windows-tesseract\tessdata

OR I may have a prior fall-back pre set in user environment where I have a copy of eng.traineddata 22.4 MB 17/01/2023, 01:16:15

enter image description here enter image description here

but to get that to stick for use both now (and in future) it sometimes needs log-out log-in to be used by the next command shell.

So in the above case on windows 10 I did NOT need to logout its available for fresh command shells, but beware shells started before that change, like some file commanders, that need stopping and re-starting.

enter image description here

