Reputation: 401
I'm trying to install tesseract library on docker image, but I'm getting errors.
I know many people have asked the same question and I've tried many solutions but it still errors. Here's the docker file
FROM python:3.7.6
RUN file="$(apt-get update && \
apt-get install -y apt-utils && \
apt-get install -y curl && \
apt-get update && \
apt-get install -y software-properties-common && \
apt-get update && \
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4BCBD87CEF9E52D && \ # this line added after search
add-apt-repository ppa:alex-p/tesseract-ocr -y )" && echo $file
RUN file = "$(apt-get update --allow-unauthenticated && \
apt install tesseract-ocr=4.1.1-1ppa1~xenial1 -y )" && echo "------ New line" && echo $file
the output:
Warning: apt-key output should not be parsed (stdout is not a terminal)
gpg: key A4BCBD87CEF9E52D: public key "Launchpad PPA for Alex_P" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: keybox '/tmp/tmpqo2rec69/pubring.gpg' created
gpg: /tmp/tmpqo2rec69/trustdb.gpg: trustdb created
gpg: key A4BCBD87CEF9E52D: public key "Launchpad PPA for Alex_P" imported
gpg: Total number processed: 1
gpg: imported: 1
Warning: apt-key output should not be parsed (stdout is not a terminal)
gpg: no valid OpenPGP data found.
then it prints some libraries installed then this error too
step : RUN file = "$(apt-get update --allow-unauthenticated &&
apt install tesseract-ocr=4.1.1-1ppa1~xenial1 -y )" &&
echo "------ New line" && echo $file
---> Running in 8d364f24dfd9
E: The repository 'http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu focal Release' does not have a Release file.
=:
Upvotes: 4
Views: 9876
Reputation: 401
I've done many search, and I solved the problem by using ubuntu base image and then install python 3.7 on it. As as stated by @Amitp answer, python base image is based on debian not ubuntu, and the ppa is using ubuntu
The docker file for the solution
FROM ubuntu:16.04
USER root
RUN file="$(apt-get update && \
apt-get install -y apt-utils && \
apt-get install -y curl && \
apt-get update && \
apt-get install -y software-properties-common && \
apt-get update && \
add-apt-repository ppa:deadsnakes/ppa -y && \
apt update && \
apt install -y python3.7 && \
curl https://bootstrap.pypa.io/get-pip.py | python3.7 &&\
apt-get update)" && echo $file
Upvotes: 1
Reputation: 437
From what I could find out, the docker image for Python 3.7.6 that you are using is based on Debian 10 and not Ubuntu 16.04 (xenial). The repository that you are trying to add (ppa:alex-p/tesseract-ocr) is for Ubuntu (https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr). So trying to install tesseract with a Ubuntu 16.04 (xenial) version (4.1.1-1ppa1~xenial1) is bound to fail as it's a Debian-based image.
You need a Debian package to install (https://tracker.debian.org/pkg/tesseract). I tried with the below Dockerfile and used version 4.0.0-2 of tesseract and it worked.
FROM python:3.7.6
RUN file="$(apt-get update && \
apt-get install -y apt-utils && \
apt-get install -y curl && \
apt-get update && \
apt-get install -y software-properties-common && \
apt-get update && \
apt install tesseract-ocr=4.0.0-2 -y )" && echo $file
After checking the image that was created, I could confirm that tesseract was indeed installed
# docker run -it 37755343ba30 bash
root@fdb06d9bdc4e:/# dpkg -l | grep tesseract
ii libtesseract4:amd64 4.0.0-2 amd64 Tesseract OCR library
ii tesseract-ocr 4.0.0-2 amd64 Tesseract command line OCR tool
ii tesseract-ocr-eng 1:4.00~git30-7274cfa-1 all tesseract-ocr language files for English
ii tesseract-ocr-osd 1:4.00~git30-7274cfa-1 all tesseract-ocr language files for script and orientation
root@fdb06d9bdc4e:/# exit
Hope this answers your question.
Upvotes: 1