Reputation: 19
since yesterday im trying to use the OCR pytesser. I solved few problems by myself but i c'ant figure out how to get ride of this one. there is the error :
H:\Python27>python.exe lol.py
Traceback (most recent call last):
File "lol.py", line 30, in <module>
print image_to_string(image)
File "H:\Python27\lib\pytesser\__init__.py", line 30, in image_to_string
call_tesseract(scratch_image_name, scratch_text_name_root)
File "H:\Python27\lib\pytesser\__init__.py", line 20, in call_tesseract
proc = subprocess.Popen(args)
File "H:\Python27\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "H:\Python27\lib\subprocess.py", line 958, in _execute_child
startupinfo)
WindowsError: [Error 2] Le fichier spÚcifiÚ est introuvable
the last line say "the file cant be found"
there is how i put the tesseract in my init.py
tesseract_exe_name = 'C:\Users\TyLo\AppData\Local\Tesseract-OCR\tesseract' # Name of executable to be called at command line
i really cant figure out why he cant open the file. there is 2 other things also, in my init.py. I can change the image file and the txt file i tried to create mine and give him the path no sucess, but i think he create them himself.
scratch_image_name = "outfile.bmp" # This file must be .bmp or other Tesseract-compatible format
scratch_text_name_root = "infile" # Leave out the .txt extension
this is the 3 files that are sent to Popen so i imagine the error is there.
I hope im clear enough for you guys to understand the problem i have.
edit: the in lol.py is from this site, just modified the url http://www.debasish.in/2012/01/bypass-captcha-using-python-and.html
Upvotes: 0
Views: 1195
Reputation: 366003
This is the problem:
tesseract_exe_name = 'C:\Users\TyLo\AppData\Local\Tesseract-OCR\tesseract' # Name of executable to be called at command line
See the \t
there? That's a single tab character, not a backslash character and a t
character. And you only get away with \U
, \T
, \A
, \L
, and \T
because you got lucky and nobody had thought of a use for them yet by the time your version of Python came out. (Later versions of Python do actually have a use for \U
.)
The solution is to do one of the following:
(1) Use a raw string literal
tesseract_exe_name = r'C:\Users\TyLo\AppData\Local\Tesseract-OCR\tesseract' # Name of executable to be called at command line
The r'…'
means "don't treat backslashes specially".
(2) Escape all of your backslashes:
tesseract_exe_name = 'C:\\Users\\TyLo\\AppData\\Local\\Tesseract-OCR\\tesseract' # Name of executable to be called at command line
In a non-raw string literal, \\
means a single backslash, so \\t
means a single backslash and a t
.
(3) Use forward slashes instead:
tesseract_exe_name = 'C:/Users/TyLo/AppData/Local/Tesseract-OCR/tesseract' # Name of executable to be called at command line
Most Windows programs accept forward slashes. A few don't, and occasionally you need a \\.\
pathname that isn't legal with forward slashes, but otherwise, this works.
Upvotes: 3