Michelle Santos
Michelle Santos

Reputation: 267

How to install Tesseract OCR on Databricks

I am trying to run the following script on a databrick python notebook:

pip install presidio-image-redactor
pip install pytesseract
python -m spacy download en_core_web_lg

from PIL import Image
from presidio_image_redactor import ImageRedactorEngine
import pytesseract

image = Image.open("images/ImageData.PNG")

engine = ImageRedactorEngine()

redacted_image = engine.redact(image, (255, 192, 203))

Upon running the last line, I'm getting the error below:

TesseractNotFoundError: tesseract is not installed or it's not in your PATH.

am I missing anything?

Upvotes: 5

Views: 3710

Answers (1)

Alex Ott
Alex Ott

Reputation: 87174

You can use %sh in a separate cell to execute the shell commands on the driver node. To install tesseract, you can do:

%sh apt-get -f -y install tesseract-ocr 

If you need to install it to all nodes of the cluster, you need to use cluster init script with the same command (without %sh)

Upvotes: 6

Related Questions