Titan
Titan

Reputation: 224

How to install java for tabula inside docker container

I cant find anything related to my question

I tried below docker file

RUN apt-get update && apt-get install -y \
software-properties-common


RUN apt-get update && \
    apt-cache search openjdk && \
    apt-get install openjdk-8-jdk && \
    apt-get clean;

RUN apt-get update && \
    apt-get install ca-certificates-java && \
    apt-get clean && \
    update-ca-certificates -f;

ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN export JAVA_HOME  

#tabula.errors.JavaNotFoundError: `java` command is not found from this Python 
#process.Please ensure Java is installed and PATH is set for `java`

when i use import tabula im getting tabula.errors.JavaNotFoundError. can someone please help what to do to get rid of this error in docker ?

UPDATE:

Im using flask and mongodb. in flask there is a code responsible to read pdf files which is tabula and it needs Java as it says in its error. for other python package i installed with pipfile and pipfile.lock

RUN pip install pipenv
COPY Pipfile .         #<---- contains tabula package
COPY Pipfile.lock .    #<---- contains tabula package
RUN PIPENV_VENV_IN_PROJECT=1 pipenv install --deploy 

##But i have no idea how to install java for tabula dependecy.

**************** FINAL UPDATE *****************

I replaced tabula with pdfplumber. now working good, thanks for all who tried to help me.

Upvotes: 0

Views: 1776

Answers (2)

In case anyone is trying to achieve that and doesn't want to switch to another library, here is a way of making it work with Tabula.

By the way, here I'm using Tabula in jupyter notebook... but you just have to change the image from Jupyter to python to achieve what you want with Flask.

Your docker-compose will be like this:

jupyter:
  container_name: jupyter_lab
  build: .
  ports: 
    - "8888:8888"
  environment: 
    - JUPYTER_ENABLE_LAB=yes
  volumes: 
    - ./work:/home/jovyan/work

This is your Dockerfile:

# Image base-notebook 
FROM jupyter/minimal-notebook #getting the basic one. 

# Change to root user to install java 8
USER root

# Install java 8
RUN apt-get update \
    && echo "Updated apt-get" \
    && apt-get install -y openjdk-8-jre \
    && echo "Installed openjdk 8"
  
# Install requirements
COPY requirements.txt ./
RUN pip3 install -r requirements.txt

RUN rm -rf requirements.txt

# Change to  "$NB_USER" command so the image runs as a non root user by default
USER $NB_UID

Your requirements.txt

tabula.py 

Create the folder "work" (that is the folder that is going to be synced with the container).

Now open the terminal and type docker-compose up --build This command will build and start your container. With these steps, you should be ready to go.

Test with this line of code in the notebook:

from tabula import read_pdf
pdf = read_pdf("path_to_your_pdf.pdf", pages='all')

Upvotes: 0

Noam Yizraeli
Noam Yizraeli

Reputation: 5404

Generally one should refrain from using a container image with more than one main process, such as python and java, and I would personally advise finding a replacement to tabula-py that doesn't require a java enviroment for that is the best practice when using containers as specified here as so:

It is generally recommended that you separate areas of concern by using one service per container.

With that in mind, because I don't know if you can do those things I'm gonna provide an alternative as well.

this docker image packs multiple runnable environments into one such as java and python, and its dockerfile is listed here. Because it encompasses more environments than you need you can slim it down to your needs.

there is also this project though it wasn't updated for awhile or this article describing a consise homebrewed python and java dockerfile

Upvotes: 1

Related Questions