Reputation: 319
I have installed the pytesseract
library using
pip install pytesseract
When I tried to use the image_to_text
method, it gave me a
FileNotFoundError: [WinError 2] The system can not find the file specified
I googled it and found that I should change something in the pytesseract.py file and the line
tesseract_cmd = 'tesseract'
should become
tesseract_cmd = path_to_folder_that_contains_tesseractEXE + 'tesseract'
I searched and haven't found any tesseract.exe
files in my Python folder, I then reinstalled the library, but the file still wasn't there. Finnally, I replaced the line by:
tesseract_cmd = path_to_folder_that_contains_pytesseractEXE + 'pytesseract'
and my program threw:
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')
What can I do make my programm work?
P.S Here is my programm code :
from pytesseract import image_to_string
from PIL import Image, ImageEnhance, ImageFilter
im = Image.open(r'C:\Users\Филипп\Desktop\ImageToText_Python\NoName.png')
print(im)
txt = image_to_string(im)
print(txt)
Full Traceback of first attempt :
File "C:/Users/user/Desktop/ImageToText.py", line 10, in <module>
text = pytesseract.image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 122, in
image_to_string config=config)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 46, in
run_tesseract proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "C:\Python\lib\subprocess.py", line 947, in __init__ restore_signals, start_new_session)
File "C:\Python\lib\subprocess.py", line 1224, in _execute_child startupinfo)
FileNotFoundError: [WinError 2]The system can not find the file specified
Full Traceback of second attempt
Traceback (most recent call last):
File "C:\Users\user\Desktop\ImageToText.py", line 6, in <module> txt = image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 125, in image_to_string
raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')
Upvotes: 6
Views: 23328
Reputation: 395
If you are using windows OS - you have to install tesseract-ocr from this link (3.05.01 is the stable version and supported for foreign language extraction). And add the path(where you installed the software) to the environment variable.
If you are using ubuntu OS - in terminal type "sudo apt-get install tesseract-ocr"
Pytesseract is python wrapper that helps you to access this tesseract-ocr software.
Note 1: if you want to extract foreign languages then you have to include tessdata files in the installed path.
Note 2: Python 2 will not have good support on foreign language extraction, so better go with python 3.
Upvotes: 0
Reputation: 22618
From project's README:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'
print(pytesseract.image_to_string(Image.open('test.png')))
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
So, you have to make sure tesseract.exe is on your computer (for example by installing Tesseract-OCR), then add the containing folder to your PATH environment variable, or declare it's location using pytesseract.pytesseract.tesseract_cmd
attribute
Upvotes: 6