Jaydog
Jaydog

Reputation: 622

How can I create a Docker image to run both Python and R?

I want to containerise a pipeline of code that was predominantly developed in Python but has a dependency on a model that was trained in R. There are some additional dependencies on the requirements and packages needed for both codebases. How can I create a Docker image that allows me to build a container that will run this Python and R code together?

For context, I have an R code that runs a model (random forest) but it needs to be part of a data pipeline that was built in Python. The Python pipeline performs some functionality first and generates input for the model, then executes the R code with that input, before taking the output to the next stage of the Python pipeline.

So I've created a template for this process by writing a simple test Python function to call an R code ("test_call_r.py" which imports the subprocess package) and need to put this in a Docker container with the necessary requirements and packages for both Python and R.

I have been able to build the Docker container for the Python pipeline itself, but cannot successfully install R and the associated packages alongside the Python requirements. I want to rewrite the Dockerfile to create an image to do this.

From the Dockerhub documentation I can create an image for the Python pipeline using, e.g.,

FROM python:3
WORKDIR /app
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
CMD [ "python", "./test_call_r.py" ]

And similarly from Dockerhub I can use a base R image (or Rocker) to create a Docker container that can run a randomForest model, e.g.,

FROM r-base
WORKDIR /app    
COPY myscripts /app/
RUN Rscript -e "install.packages('randomForest')"
CMD ["Rscript", "myscript.R"] 

But what I need is to create an image that can install the requirements and packages for both Python and R, and execute the codebase to run R from a subprocess in Python. How can I do this?

Upvotes: 22

Views: 13675

Answers (3)

Leopoldo Varela
Leopoldo Varela

Reputation: 364

Being specific on both Python and R versions will save you future headaches. This approach, for instance, will always install R v4.0 and Python v3.8

FROM r-base:4.0.3
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends build-essential libpq-dev python3.8 python3-pip python3-setuptools python3-dev
RUN pip3 install --upgrade pip

ENV PYTHONPATH "${PYTHONPATH}:/app"
WORKDIR /app

ADD requirements.txt .
ADD requirements.r .

# installing python libraries
RUN pip3 install -r requirements.txt

# installing r libraries
RUN Rscript requirements.r

And your requirements.r file should look like

install.packages('data.table')
install.packages('jsonlite')
...

Upvotes: 6

Jaydog
Jaydog

Reputation: 622

The Dockerfile I built for Python and R to run together with their dependencies in this manner is:

FROM ubuntu:latest

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y --no-install-recommends build-essential r-base r-cran-randomforest python3.6 python3-pip python3-setuptools python3-dev

WORKDIR /app

COPY requirements.txt /app/requirements.txt

RUN pip3 install -r requirements.txt

RUN Rscript -e "install.packages('data.table')"

COPY . /app

The commands to build the image, run the container (naming it SnakeR here), and execute the code are:

docker build -t my_image .
docker run -it --name SnakeR my_image
docker exec SnakeR /bin/sh -c "python3 test_call_r.py"

I treated it like a Ubuntu OS and built the image as follows:

  • suppress the prompts for choosing your location during the R install;
  • update the apt-get;
  • set installation criteria of:
    • y = yes to user prompts for proceeding (e.g. memory allocation);
    • install only the recommended, not suggested, dependencies;
  • include some essential installation packages for Ubuntu;
  • r-base for the R software;
  • r-cran-randomforest to force the package to be available (unlike the separate install of data.table which didn’t work for randomForest for some reason);
  • python3.6 version of python;
  • python3-pip to allow pip be used to install the requirements;
  • python3-setuptools to somehow help execute the pip installs (?!);
  • python3-dev to execute the JayDeBeApi installation as part of the requirements (that it otherwise confuses is for Python2 not 3);
  • specify the active “working directory” to be the /app location;
  • copy the requirements file that holds the python dependencies (built from the virtual environment of the Python codebase, e.g., with pip freeze);
  • install the Python packages from the requirements file (pip3 for Python3);
  • install the R packages (e.g. just data.table here);
  • copy the directory contents to the specified working directory /app.

This is replicated from my blog post at https://datascienceunicorn.tumblr.com/post/182297983466/building-a-docker-to-run-python-r

Upvotes: 21

Dipayan
Dipayan

Reputation: 213

I made an image for my personal projects, you could use this if you want: https://github.com/dipayan90/docker-python-r

Upvotes: 2

Related Questions