Reputation: 3938
I am trying to use pypdfocr
in Windows 7 with Python 2.7.
This is the ERROR Message I get when I try pypdfocr
in cmd
:
C:\Users\chamar.stu>pypdfocr F:\test2.pdf Starting conversion of F:\test2.pdf 'pdfimages' is not recognized as an internal or external command, operable program or batch file. WARNING: Could not execute pdfimages to calculate DPI (try installing xpdf or po ppler?), so defaulting to 300dpi Traceback (most recent call last): File "c:\users\chamar.stu\appdata\local\continuum\anaconda2\lib\runpy.py", line 174, in _run_module_as_main ... .... ....
pypdfocr\pypdfocr_tesseract.py", line 98, in _is_version_uptodate ver = [int(x) for x in ver_str.split('.')] ValueError: invalid literal for int() with base 10: '00alpha'
It seems that I am missing Poppler
or XPDF
but I did install Poppler via PyGoObject as suggested here. I've also link xpdf
in my environmental path as suggested here.
Any suggestions to get me out of this little mess?
Upvotes: 2
Views: 394
Reputation: 17541
Try downgrading Tesseract from version 4.0.0-beta.1(my case) to version 3.x that doesn't contain alphanumericals in the name.
tesseract --version
#to check
The version check built into the pypdfocr package is expecting the version numbers to be integers, hence the error on '00alpha'
('0-beta'
in my case)
Upvotes: 0
Reputation: 43533
The pypdfocr
script is probably calling the pdfimages
program (one of the poppler utilities, not the library) using the subprocess
module.
I could not easily discern if the utilities were provided in the URI you mention.
If not, you can find pre-built ms-windows executables for the utilities e.g. here.
Make sure that the location where the poppler utilities are installed is in your PATH
, so that pypdfocr
can find it.
Upvotes: 1