n1c9
n1c9

Reputation: 2697

Trouble installing tesseract-ocr package - ''compile failed with error code 1 in /tmp/pip_build_root/tesseract-ocr''

Trying to install tesseract-ocr package for use with pytesseract, running into an odd issue. Installing everything else with pip worked, but when I tried sudo pip install tesseract-ocr as instructed here, I get the following errors:

Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_root/tesseract-ocr/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-zsaPkE-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip_build_root/tesseract-ocr
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    load_entry_point('pip==1.5.4', 'console_scripts', 'pip')()
  File "/usr/lib/python2.7/dist-packages/pip/__init__.py", line 235, in main
    return command.main(cmd_args)
  File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 161, in main
    text = '\n'.join(complete_log)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 42: ordinal not in range(128)

I have a feeling that the traceback is causing the UnicodeDecodeError. Does anyone have any ideas on how to resolve this?

Upvotes: 3

Views: 6241

Answers (1)

Eddie
Eddie

Reputation: 86

The link provided only mentions the use of Pip for installing pytesseract not Tesseract-OCR.

As mentioned you will also need the Python Imaging Library (PIL), if it is not installed in your system you can use Pillow by using sudo pip install pillow.

Tesseract-OCR is not installed with Pip using sudo pip install tesseract-ocr since it is not a Python module like pytesseract. From what I see Tesseract-OCR is written mostly in C++.

The link given, http://code.google.com/p/tesseract-ocr/, is no longer hosting Tesseract-OCR as the project has been moved to https://github.com/tesseract-ocr/tesseract.

Install instructions can be found on https://github.com/tesseract-ocr/tesseract/wiki.

For Linux use, sudo apt-get install tesseract-ocr or sudo apt-get install tesseract-ocr-all to install all languages.

For Mac use, brew install tesseract or brew install tesseract --all-languages to install all languages. You will need Homebrew installed, it can be found at https://brew.sh.

For Windows, installer can be found on https://github.com/tesseract-ocr/tesseract/wiki/Downloads/. Current stable version should comes with all languages included.

Upvotes: 4

Related Questions