CloudyTrees
CloudyTrees

Reputation: 721

Optimizing Docker with R/python dependencies

I originally posted the question on a Docker forum but haven't received any comment yet, so am posting it here given there's much more traffic at SO.

https://forums.docker.com/t/using-multi-stage-docker-build-for-slimming-down-images-with-r-dependency/67967

In a single sentence, I'm trying to slim down my R/python docker, any suggestions welcome! Thank you!


So, I'm building docker images for some applications that has R dependencies, but the naive build process that I wrote (please see below for Dockerfile, stage 1) leads to, IMO, inflated image size.

Therefore I'm thinking about using multi-stage build, reading how awesome it can be for shrinking down the image size.

Apparently, simply copying the R & Rscript binary and the packages from the build layer won't work, as I did get the following error message, indicating I also need to copy those dynamic libs dependencies.

/usr/lib/R/bin/R: line 238: /usr/lib/R/etc/ldpaths: No such file or directory
/usr/lib/R/bin/exec/R: error while loading shared libraries: libR.so: cannot open shared object file: No such file or directory

So my question is,

And a remotely related issue: would it be a similar scenario for Python dependencies as well?

Thanks!


Illustration with R

####### stage 1: build
FROM ubuntu:18.10 as builder

# update OS libs
ARG OS_LIBS="software-properties-common libcurl4-openssl-dev libssl-dev libxml2-dev gpg-agent gnupg"
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get -qqy update --fix-missing && \
    apt-get -qqy full-upgrade && \
    apt-get -qqy install --no-install-recommends \
        ${OS_LIBS} && \
    apt-get autoremove --purge -y && apt-get autoclean -y && \
    rm -rf /var/cache/apt/* /var/lib/apt/lists/* /var/tmp/* /tmp/* /usr/share/man/?? /usr/share/man/??_*

# install base R
ARG R_RELEASE_VERSION="3.5.1"
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 && \
    add-apt-repository "deb http://cran.rstudio.com/bin/linux/ubuntu cosmic-cran35/" && \
    apt-get -qqy update --fix-missing && \
    apt-get -qqy full-upgrade && \
    apt-get -qqy install --no-install-recommends \
        r-base-core="${R_RELEASE_VERSION}"-1build1 \
        r-base-dev="${R_RELEASE_VERSION}"-1build1
# this is just a toy example
RUN R -e -vanilla 'install.packages("data.table", destdir = "/tmp/R_pkg_download/", clean = TRUE)'



####### stage 2: copy the binary and libs
FROM ubuntu:18.10

RUN mkdir -p /usr/lib/R \
             /usr/local/lib/R/site-library
COPY --from=builder /usr/bin/R /usr/bin/R
COPY --from=builder /usr/bin/Rscript /usr/bin/Rscript
COPY --from=builder /usr/lib/R /usr/lib/R
COPY --from=builder /usr/local/lib/R/site-library /usr/local/lib/R/site-library

Upvotes: 0

Views: 410

Answers (2)

CloudyTrees
CloudyTrees

Reputation: 721

So, prompted by the answer from @Jan Garaj (the accepted answer, because it summarized well), I went to dig deeper and see if I can find some data to support the claim.

Building an image with the following Dockerfile, we see output of directory sizes given further below as tables. The conclusion is, just as @Jan Garaj pointed out, the original optimization idea is not worth it, when it comes to slimming down the image size.


Dockerfile note that this won't run out of the box since I'm not including the R package installation script, but that is trivial to write.

FROM ubuntu:18.10

ARG DEBIAN_FRONTEND=noninteractive

RUN du -sh --exclude=/proc /* 
RUN du -sh /usr/* && \
    du -sh /usr/lib/* && \
    du -sh /usr/local/*

ARG R_DEP_TRANSIENT="make gpg-agent gnupg"
ARG R_DEPENDENCIES="software-properties-common libcurl4-openssl-dev libssl-dev libxml2-dev ${R_DEP_TRANSIENT} g++"
RUN apt-get -qqy update --fix-missing && \
    apt-get -qqy full-upgrade && \
    apt-get -qqy install --no-install-recommends \
                 ${R_DEPENDENCIES} && \
    rm -rf /var/cache/apt/* /var/lib/apt/lists/* /var/tmp/* /tmp/* /usr/share/man/?? /usr/share/man/??_* && \
    du -sh --exclude=/proc /* 
RUN du -sh /usr/* && \
    du -sh /usr/lib/* && \
    du -sh /usr/local/* && \
    du -sh /usr/local/lib/*

ARG R_RELEASE_VERSION="3.5.1"
ARG SLIM_R_LIB_CMD="find .  -type d \\( -name \"help\" -o -name \"doc\" -o -name \"html\" -o -name \"htmlwidgets\" -o -name \"demo\" -o -name \"demodata\" -o -name \"examples\" -o -name \"exampleData\" -o -name \"unitTests\" -o -name \"tests\" -o -name \"testdata\" -o -name \"shiny\" \\) | xargs rm -rf"
ADD install_R_packages.R .
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 && \
    add-apt-repository "deb http://cran.rstudio.com/bin/linux/ubuntu cosmic-cran35/" && \
    apt-get -qqy update --fix-missing && \
    apt-get -qqy full-upgrade && \
    apt-get -qqy install --no-install-recommends \
                 r-base-core="${R_RELEASE_VERSION}"-1build1 \
                 r-base-dev="${R_RELEASE_VERSION}"-1build1 && \
    mkdir -p /tmp/R_pkg_download/ && \
    Rscript install_R_packages.R && \
    cd "/usr/local/lib/R/site-library" && eval ${SLIM_R_LIB_CMD} && \
    cd "/usr/lib/R/site-library" && eval ${SLIM_R_LIB_CMD} && \
    apt-get -qqy purge \
                 ${R_DEP_TRANSIENT} && \
    apt-get -qqy autoremove --purge && apt-get -qqy autoclean && \
    rm -rf install_R_packages.R /tmp/R_pkg_download/ /var/cache/apt/* /var/lib/apt/lists/* /var/tmp/* /tmp/* /usr/share/man/?? /usr/share/man/??_* && \
    du -sh --exclude=/proc /* 
RUN du -sh /usr/* && \
    du -sh /usr/lib/* && \
    du -sh /usr/local/* && \
    du -sh /usr/local/lib/*


Tables

General

general

"usr/lib/*"

"usr/lib"

"usr/local/*"

"usr/local"

"usr/local/lib/*"

"usr/local/lib"

Upvotes: 0

Jan Garaj
Jan Garaj

Reputation: 28716

Yes, you need to copy also shared libraries (for example mentioned libR.so), because they are required by dynamically linked R binaries.

But this image size optimization isn't worth it, unless you have specific use case. Price of saved disk space is probably much more lower than value of the time, which you will spend on this optimization. I will use some ready R image from rocker (rocker/r-ver) in your case - proved R images for general R use.

Upvotes: 2

Related Questions