Philippe
Philippe

Reputation: 319

File tesseract.exe does not exist

I have installed the pytesseract library using

pip install pytesseract

When I tried to use the image_to_text method, it gave me a

FileNotFoundError: [WinError 2] The system can not find the file specified

I googled it and found that I should change something in the pytesseract.py file and the line

tesseract_cmd = 'tesseract'

should become

tesseract_cmd = path_to_folder_that_contains_tesseractEXE + 'tesseract'  

I searched and haven't found any tesseract.exe files in my Python folder, I then reinstalled the library, but the file still wasn't there. Finnally, I replaced the line by:

tesseract_cmd = path_to_folder_that_contains_pytesseractEXE + 'pytesseract'

and my program threw:

pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')

What can I do make my programm work?

P.S Here is my programm code :

from pytesseract import image_to_string
from PIL import Image, ImageEnhance, ImageFilter

im = Image.open(r'C:\Users\Филипп\Desktop\ImageToText_Python\NoName.png') 
print(im)

txt = image_to_string(im)
print(txt)

Full Traceback of first attempt :

File "C:/Users/user/Desktop/ImageToText.py", line 10, in <module>
text = pytesseract.image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 122, in 
image_to_string config=config)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 46, in 
run_tesseract proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "C:\Python\lib\subprocess.py", line 947, in __init__ restore_signals, start_new_session)
File "C:\Python\lib\subprocess.py", line 1224, in _execute_child startupinfo)
FileNotFoundError: [WinError 2]The system can not find the file specified

Full Traceback of second attempt

Traceback (most recent call last):
File "C:\Users\user\Desktop\ImageToText.py", line 6, in <module> txt = image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 125, in image_to_string
raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')

Upvotes: 6

Views: 23328

Answers (3)

Deepan Raj
Deepan Raj

Reputation: 395

  1. If you are using windows OS - you have to install tesseract-ocr from this link (3.05.01 is the stable version and supported for foreign language extraction). And add the path(where you installed the software) to the environment variable.

  2. If you are using ubuntu OS - in terminal type "sudo apt-get install tesseract-ocr"

  3. Pytesseract is python wrapper that helps you to access this tesseract-ocr software.

Note 1: if you want to extract foreign languages then you have to include tessdata files in the installed path.

Note 2: Python 2 will not have good support on foreign language extraction, so better go with python 3.

Upvotes: 0

Antwane
Antwane

Reputation: 22618

From project's README:

try:
    import Image
except ImportError:
    from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'

print(pytesseract.image_to_string(Image.open('test.png')))
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))

So, you have to make sure tesseract.exe is on your computer (for example by installing Tesseract-OCR), then add the containing folder to your PATH environment variable, or declare it's location using pytesseract.pytesseract.tesseract_cmd attribute

Upvotes: 6

Philippe
Philippe

Reputation: 319

For people in the same case as me: here is a tesseract-OCR downloader. After you finish the download, go to the path you've chosen, there should be a file named tesseract.exe, copy the path to this file and paste it into pytesseract.exe.

Upvotes: 4

Related Questions