Spencer
Spencer

Reputation: 5

Installing OCRmyPDF on Windows Server 2016 - Can't find liblept.dll. Is editing Path safe?

I'm trying to import ocrmypdf on my company's client's Windows Server 2016 Build 14393 computer using Python 37-32. When I import the library, in a Jupyter Notebook, it is unable to locate leptonica by using ctypes.utility.find_library().

Ocrmypdf is a Linux-developed Python 3 package. Per the documentation (https://ocrmypdf.readthedocs.io/en/latest/introduction.html) it does not support Windows. The suggested workarounds are a docker container and Windows Subsystem for Linux.

I would rather not use a docker container as neither I nor my coworkers are very experienced with it. I am unable to use wsl as it is not available for my build of Windows Server 2016 (see the troubleshoot subsection: https://learn.microsoft.com/en-us/windows/wsl/install-on-server)

This discussion (find_library() in ctypes) states that you can point ctypes.utility.find_library to the needed library file by editing the environment Path variable to be a folder which includes it. Conveniently, Tesseract OCR's windows download includes liblept. Would editing the Path variable to point toward that folder be a dangerous thing to do?

Edit: I tried adding the path to Tesseract-OCR's folder on my laptop's environment Path and restarted Anaconda, etc. ocrmypdf still gave the same error. A closer read of that discussion brought up the point that find_library operates differently on Windows. A read of the documentation (https://docs.python.org/2.5/lib/ctypes-finding-shared-libraries.html) states that "On Windows, find_library searches along the system search path, and returns the full pathname, but since there is no predefined naming scheme a call like find_library("c") will fail and return None." Does this mean I have to hardcode in a name to use in order to find the library?

This issue has been replicated, albeit on a different machine, here: https://github.com/jbarlow83/OCRmyPDF/issues/341. You can reproduce the issue by running the below code on a Windows machine.

!pip install ocrmypdf
import ocrmypdf

The expected result of the above code is that ocrmypdf is successfully imported in a usable form. The result of the above code is:

OSError                                   Traceback (most recent call last)
<ipython-input-2-a81f3474d7ad> in <module>
----> 1 import ocrmypdf

~\AppData\Local\Continuum\anaconda3\lib\site-packages\ocrmypdf\__init__.py in <module>
     16 # along with OCRmyPDF.  If not, see <http://www.gnu.org/licenses/>.
     17 
---> 18 from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo
     19 from ._version import PROGRAM_NAME, __version__
     20 from .api import Verbosity, configure_logging, ocr

~\AppData\Local\Continuum\anaconda3\lib\site-packages\ocrmypdf\leptonica.py in <module>
     40 logger = logging.getLogger(__name__)
     41 
---> 42 lept = ffi.dlopen(find_library('lept'))
     43 lept.setMsgSeverity(lept.L_SEVERITY_WARNING)
     44 

OSError: cannot load library '<None>': error 0x57

Upvotes: 0

Views: 2355

Answers (1)

zethw
zethw

Reputation: 373

I have been able to get this working Windows 10 by updating the path and it works fine. I used msys2 to install it, hence, the path name. Update to point where your liblept-5.dll is located.

if os.name == 'nt':
    os.environ['PATH'] = os.environ.get("PATH", "") + ';C:\\msys64\\mingw64\\bin'

Upvotes: 0

Related Questions