Reputation: 100
I'm getting the error pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path. I tested my program just minutes before this came up and it worked perfectly. Then I tested it again and it keeps showing this error. I don't know what to do. Here is my code:
from PIL import ImageGrab
import cv2
import pytesseract
import numpy as np
from tkinter import Tk
from tkinter.filedialog import askopenfilename
ask = input("Do you want to ocr in realtime or choose a picture (r/p)?")
if ask == 'r':
while True:
screen = np.array(ImageGrab.grab(bbox=(700, 300, 1600, 1000)))
# print('Frame took {} seconds'.format(time.time()-last_time))
cv2.imshow('window', screen)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
print(pytesseract.image_to_string(screen, lang='eng', config='--psm 6'))
if ask == 'p':
Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
filename = askopenfilename() # show an "Open" dialog box and return the path to the selected file
print(pytesseract.image_to_string(filename, lang='eng', config='--psm 6'))
Upvotes: 1
Views: 6884
Reputation: 1293
The installation procedure and the trained data file are the most important. For example, Arabic language requires ara.traindata file. I suggest using the proper language model and the latest version:
tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe (64 bit) resp.
To validate installation in the power shell or cmd terminal execute:
tesseract -v
It will output something like this: tesseract v5.0.0-alpha.20200328
brew install tesseract
To validate installation in the power shell or cmd terminal execute:
tesseract -v
It will output something like this: tesseract 4.1.1 and also the installed image libraries leptonica-1.80.0 libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1 Found AVX2 Found AVX Found FMA Found SSE
If you are not sure about the path, then simply copy paste the ara.traindata file in the same folder as that of your Python .py file
import pytesseract
from PIL import Image
import os
os.environ["TESSDATA_PREFIX"] = "" # Leaving it empty because file is already copy pasted in the current directory
print(os.getenv("TESSDATA_PREFIX"))
# Copy paste the ara.traineddata file in the same directory as this python code
print(pytesseract.image_to_string(Image.open('cropped.png'), lang="ara"))
sudo apt-get install tesseract-ocr
The validation and run code is same as that of Mac Os
Also make sure the path is fine.
This code works fine if the ara.traineddata file is downloaded successfully:
import pytesseract
from PIL import Image
print(pytesseract.image_to_string(Image.open('cropped.png'), lang="ara"))
You can follow this tutorial for details. Here is the demo output of this tutorial which uses all available languages.
Upvotes: 0
Reputation: 860
There could be multiple problems for this issue.
Check If tesseract.exe is installed. If not get exe file from below link and install the same. Remember the installation path for future reference.
https://github.com/UB-Mannheim/tesseract/wiki
If you already have tesseract installed. But pytesseract is unable to access tesseract using python. You can set the path with in the script like this.
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
Upvotes: 1
Reputation: 2096
I have stuck at the same problem in the past, I think you have to make sure that you :
pip install pytesseract
Adding a new variable called 'tesseract' in environment variables with a value of
C:\Program Files (x86)\Tesseract-OCR\tesseract.exe
If you run tesseract in the command line should work by giving you usage information
That's it :)
Upvotes: 1
Reputation: 752
You need to tell pytesseract where the tesseract binary is located:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'<full_path_to_your_tesseract_executable>'
Doing this should solve your problem
Upvotes: 0