zlk2000
zlk2000

Reputation: 53

raise TesseractError(proc.returncode, get_errors(error_string))

I am trying to extact text from an image using the pytesseract module in Python but I keep getting an error when I execute my code below. There is a similar question that someone provided with this answer https://stackoverflow.com/a/54914105/12642523 ..... but I still get the same error. Any tips?

import pytesseract as py
from PIL import Image
cmd = py.pytesseract.tesseract_cmd =r'C:\Users\mortiz\AppData\Local\Programs\Python\Python37-32\Scripts\pytesseract.exe'
img=r"C:\Python\Images to text\databases.jpg"
py.image_to_string(img)

---------------------------------------------------------------------------
TesseractError                            Traceback (most recent call last)
<ipython-input-86-5e06d7c425c6> in <module>
      3 cmd = py.pytesseract.tesseract_cmd =r'C:\Users\mortiz\AppData\Local\Programs\Python\Python37-32\Scripts\pytesseract.exe'
      4 img=r"C:\Python\Images to text\databases.jpg"
----> 5 py.image_to_string(img)

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, config, nice, output_type, timeout)
    346         Output.DICT: lambda: {'text': run_and_get_output(*args)},
    347         Output.STRING: lambda: run_and_get_output(*args),
--> 348     }[output_type]()
    349 
    350 

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in <lambda>()
    345         Output.BYTES: lambda: run_and_get_output(*(args + [True])),
    346         Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 347         Output.STRING: lambda: run_and_get_output(*args),
    348     }[output_type]()
    349 

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
    256         }
    257 
--> 258         run_tesseract(**kwargs)
    259         filename = kwargs['output_filename_base'] + extsep + extension
    260         with open(filename, 'rb') as output_file:

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
    232     with timeout_manager(proc, timeout) as error_string:
    233         if proc.returncode:
--> 234             raise TesseractError(proc.returncode, get_errors(error_string))
    235 
    236 

TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')

Upvotes: 3

Views: 4385

Answers (2)

Burugu Anudeep
Burugu Anudeep

Reputation: 31

! apt install tesseract-ocr
! apt install libtesseract-dev

and set

pytesseract.pytesseract.tesseract_cmd = r'/usr/local/bin/pytesseract'

because tesseract is a binary installed on your computer and you need to point your pytesseract to it.

Upvotes: 0

Sreekiran A R
Sreekiran A R

Reputation: 3421

You are passing the string as image, not image. You have to change the tesseract call as:

img=r"C:\Python\Images to text\databases.jpg"
py.image_to_string(Image.open(img))

Alternately, You can use opencv to open the image. Works fine.

You can pip install opencv using.

pip install opencv-python

Once you have installed, you can read an image by

import cv2
import pytesseract
image=cv2.imread('path/to/image.jpg')
string=pytesseract.image_to_string(image)

Upvotes: 2

Related Questions