user1812076
user1812076

Reputation: 279

pytesseract error when used with mod_wsgi

I am trying to set up an OCR web service, so that I can send the images for processing from multiple locations.

I never did anything with cgi, so I've said that it's time to try mod_wsgi. Took me like 2 days, to install all the libraries, and opencv and pytesseract. My OCR is working just fine if I do it the "normal way" (start a new python window interpreter). I had lot of troubles making some libraries work with mod_wsgi even though they are working normally.

I got stuck at pytessearct. If I run it with:

tesseract -l myl image.jpe out

Everything works file.

Even if I do it like this:

import pytessearct
from PIL import Image

pytesseract.image_to_string(Image.open('/var/www/path/image.jpe'), lang='myl')

this works as well.

If I do it using mod_wsgi, I get this error in my httpd log file:

mod_wsgi (pid=1836): Exception occurred processing WSGI script '/var/www/path/app.wsgi'.
[Mon May 18 06:28:31 2015] [error] [client IP] Traceback (most recent call last):
[Mon May 18 06:28:31 2015] [error] [client IP]   File "/var/www/path/app.wsgi", line 28, in wsgi_app
[Mon May 18 06:28:31 2015] [error] [client IP]     output = check_text('a.jpe')
[Mon May 18 06:28:31 2015] [error] [client IP]   File "/var/www/path/app.wsgi", line 20, in check_text
[Mon May 18 06:28:31 2015] [error] [client IP]     return pytesseract.image_to_string(Image.open('/var/www/path/a.jpe'), lang='myl')
[Mon May 18 06:28:31 2015] [error] [client IP]   File "/usr/local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 161, in image_to_string
[Mon May 18 06:28:31 2015] [error] [client IP]     boxes=boxes,
[Mon May 18 06:28:31 2015] [error] [client IP]   File "/usr/local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 94, in run_tesseract
[Mon May 18 06:28:31 2015] [error] [client IP]     stderr=subprocess.PIPE)
[Mon May 18 06:28:31 2015] [error] [client IP]   File "/usr/local/lib/python2.7/subprocess.py", line 710, in __init__
[Mon May 18 06:28:31 2015] [error] [client IP]     errread, errwrite)
[Mon May 18 06:28:31 2015] [error] [client IP]   File "/usr/local/lib/python2.7/subprocess.py", line 1335, in _execute_child
[Mon May 18 06:28:31 2015] [error] [client IP]     raise child_exception
[Mon May 18 06:28:31 2015] [error] [client IP] OSError: [Errno 2] No such file or directory

Here is my app.wsgi file:

#!/usr/local/bin python2.7
#-*- coding: utf-8 -*-

import os
import sys
from subprocess import check_output



sys.path.append('/var/www/path')

import pytesseract
from PIL import Image

def check_text(image_path):
#   return check_output(['pytesseract', '-l', 'myl', '/var/www/path/a.jpe'])
        return pytesseract.image_to_string(Image.open('/var/www/path/a.jpe'), lang='myl')


def wsgi_app(environ, start_response):
        output = sys.version.encode('utf-8')
        status = '200 OK'
        headers = [('Content-type', 'text/plain'), ('Content-Length', str(len(output)))]
        output = check_text('a.jpe')
        start_response(status, headers)
        return os.getcwd()
        return output

# mod_wsgi need the *application* variable to serve our small app
application = wsgi_app

As you can see in the source, I've tried with check_output from subprocess as well, to start a new pytesseract process myself, but I get the same error.

I've built tesseract and mod_wsgi from source. But again, I'm sure it has something to do with mod_wsgi, since it's working if I do it normally in python.

UPDATE: I had a similar "strange" problem with mod_wsgi and opencv. The question and answer can be found here: Occasional ctypes error importing numpy from mod_wsgi django app

Any suggestion will be appreciated.

Upvotes: 0

Views: 400

Answers (1)

user1812076
user1812076

Reputation: 279

In order to solve it, I've changed in /usr/local/lib/python2.7/site-packages/pytesseract/pytesseract.py the line tesseract_cmd = 'tesseract' to tesseract_cmd = '/usr/local/bin/tesseract'.

Upvotes: 1

Related Questions