rgms
rgms

Reputation: 129

How do I 'copy' installed R-packages from the 1ste stage to 2nd stage using multistage building on a R-base image?

I'm trying to build an image base on R-base, following the multi stage method. How can I copy the installed packages from the 1ste stage into the 2nd stage? And nothing else?

The current file gives me basically a 'packageless' R-base version. So the packages installed in the 1ste stage are 'lost' somewhere.

I think it has something to do with making and choosing the correct directories. This is a confusing part for me, since I'm fairly new to dockerizing applications.

Thanks for all your help!

Below my current file:

# Base image
FROM rocker/r-base:latest AS stage1

## install binary, build and dependend packages
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
r-cran-pdftools \
r-cran-dplyr \
r-cran-stringr \
libxml2-dev \
libssl-dev && \
echo "r <- getOption('repos');r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile && \
Rscript -e "install.packages(c('AzureStor'))"

##2nd stage, pulling 'fresh' base image
FROM rocker/r-base:latest

#COPY packages from 1st stage
COPY --from=stage1 /usr/local/lib/R/site-library /usr/local/lib/R/site-library

## create directories
RUN mkdir -p /script \

#Copy scripts
COPY /script /script

## Set workdir
WORKDIR /script

Upvotes: 3

Views: 2597

Answers (1)

Ralf Stubner
Ralf Stubner

Reputation: 26843

In the comments you note that you want to get rid of any excess 'weight'. The latter typically comes from having development tools and packages installed. Now the rocker/r-base image brings in quite a bit of weight already, since it has r-base-devel with its dependencies installed. However, we can try to not add further weight by having only the run-time dependencies in the final image by getting rid of the build-time dependencies. Build-time dependencies that are not necessary at run-time for an R package are typically development files like header files for system libraries, e.g. you don't need the libxml2-dev package at run-time. The libxml2 package would be enough. I see several possible approaches to this.

First, you could use binary packages for those packages that need compilation against system libraries. I have not checked the dependencies for AzureStor, but it might well be that all the required R packages exist as compiled Debian packages. These will only depend on the run-time dependencies keeping the images size small and the build time short. Your Dockerfile would look something like this:

FROM rocker/r-base:latest

## install binary, build and dependend packages
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
    r-cran-pdftools \
    r-cran-dplyr \
    r-cran-stringr \
    r-cran-... \
    r-cran-... && \
    Rscript -e "install.packages(c('AzureStor'))" && \
    apt-get clean %% \
    rm -rf /var/lib/apt/lists/* && \
    rm -rf /tmp/*

## create directories
RUN mkdir -p /script 

#Copy scripts
COPY /script /script

## Set workdir
WORKDIR /script

Second, you could install both build- and run-time dependencies before installing R packages from source and remove the build-time dependencies after it, all within one command:

FROM rocker/r-base:latest

## install binary, build and dependend packages
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
    r-cran-pdftools \
    r-cran-dplyr \
    r-cran-stringr \
    libxml2-dev libxml2 \
    libssl-dev libssl1.1 && \
    Rscript -e "install.packages(c('AzureStor'))" && \
    apt-get purge --yes libxml2-dev libssl-dev && \
    apt-get clean %% \
    rm -rf /var/lib/apt/lists/* && \
    rm -rf /tmp/*


## create directories
RUN mkdir -p /script 

#Copy scripts
COPY /script /script

## Set workdir
WORKDIR /script

Finally, you could use a multistage build with three stages:

  1. Add the run-time dependencies.
  2. Add the build-time dependencies and install packages into /usr/local/lib/R/site-library.
  3. Use 1. as base and add the packages from 2.

So something like this:

# Base image
FROM rocker/r-base:latest AS stage1

## install binary, build and dependend packages
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
r-cran-pdftools \
r-cran-dplyr \
r-cran-stringr \
libxml2 \
libssl1.1 && \
apt-get clean %% \
rm -rf /var/lib/apt/lists/* && \
rm -rf /tmp/*

FROM stage1 AS stage2
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
libxml2-dev \
libssl-dev && \
Rscript -e "install.packages(c('AzureStor'))"


FROM stage1

COPY --from=stage2 /usr/local/lib/R/site-library /usr/local/lib/R/site-library

## create directories
RUN mkdir -p /script \

#Copy scripts
COPY /script /script

## Set workdir
WORKDIR /script

I have personally used the first and second approach. I have not tested the third approach, by I expect it to work as well.

Upvotes: 2

Related Questions