Reputation: 5
I'm trying to import ocrmypdf on my company's client's Windows Server 2016 Build 14393 computer using Python 37-32. When I import the library, in a Jupyter Notebook, it is unable to locate leptonica by using ctypes.utility.find_library().
Ocrmypdf is a Linux-developed Python 3 package. Per the documentation (https://ocrmypdf.readthedocs.io/en/latest/introduction.html) it does not support Windows. The suggested workarounds are a docker container and Windows Subsystem for Linux.
I would rather not use a docker container as neither I nor my coworkers are very experienced with it. I am unable to use wsl as it is not available for my build of Windows Server 2016 (see the troubleshoot subsection: https://learn.microsoft.com/en-us/windows/wsl/install-on-server)
This discussion (find_library() in ctypes) states that you can point ctypes.utility.find_library to the needed library file by editing the environment Path variable to be a folder which includes it. Conveniently, Tesseract OCR's windows download includes liblept. Would editing the Path variable to point toward that folder be a dangerous thing to do?
Edit: I tried adding the path to Tesseract-OCR's folder on my laptop's environment Path and restarted Anaconda, etc. ocrmypdf still gave the same error. A closer read of that discussion brought up the point that find_library operates differently on Windows. A read of the documentation (https://docs.python.org/2.5/lib/ctypes-finding-shared-libraries.html) states that "On Windows, find_library searches along the system search path, and returns the full pathname, but since there is no predefined naming scheme a call like find_library("c") will fail and return None." Does this mean I have to hardcode in a name to use in order to find the library?
This issue has been replicated, albeit on a different machine, here: https://github.com/jbarlow83/OCRmyPDF/issues/341. You can reproduce the issue by running the below code on a Windows machine.
!pip install ocrmypdf
import ocrmypdf
The expected result of the above code is that ocrmypdf is successfully imported in a usable form. The result of the above code is:
OSError Traceback (most recent call last)
<ipython-input-2-a81f3474d7ad> in <module>
----> 1 import ocrmypdf
~\AppData\Local\Continuum\anaconda3\lib\site-packages\ocrmypdf\__init__.py in <module>
16 # along with OCRmyPDF. If not, see <http://www.gnu.org/licenses/>.
17
---> 18 from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo
19 from ._version import PROGRAM_NAME, __version__
20 from .api import Verbosity, configure_logging, ocr
~\AppData\Local\Continuum\anaconda3\lib\site-packages\ocrmypdf\leptonica.py in <module>
40 logger = logging.getLogger(__name__)
41
---> 42 lept = ffi.dlopen(find_library('lept'))
43 lept.setMsgSeverity(lept.L_SEVERITY_WARNING)
44
OSError: cannot load library '<None>': error 0x57
Upvotes: 0
Views: 2355
Reputation: 373
I have been able to get this working Windows 10 by updating the path and it works fine. I used msys2 to install it, hence, the path name. Update to point where your liblept-5.dll is located.
if os.name == 'nt':
os.environ['PATH'] = os.environ.get("PATH", "") + ';C:\\msys64\\mingw64\\bin'
Upvotes: 0