Reputation: 1014
Requirement: To install R Packages in Airflow to execute the RScript in Airflow
Tried: Below commands in DockerFile
# Global Docker Build arguments
ARG AIRFLOW_VERSION=2.3.3
ARG PYTHON_RUNTIME_VERSION=3.8
FROM apache/airflow:${AIRFLOW_VERSION}-python${PYTHON_RUNTIME_VERSION}
SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
USER 0
RUN apt-get update && apt-get install -y r-base r-base-core && \
rm -r /var/lib/apt/lists/*
RUN R -e "install.packages('httr', repos='https://cran.rstudio.com/')"
RUN R -e "install.packages('jsonlite', repos='http://cran.rstudio.com/')"
Error : Error in library(httr) : there is no package called ‘httr’
Also Tried another by including in the RScript as
install.packages("httr")
install.packages("jsonlite")
library(httr)
library(jsonlite)
Got error as
Running command: ['Rscript', '/opt/airflow/dags/r_scripts/R_script.R']
[2023-01-04, 16:57:58 UTC] {subprocess.py:85} INFO - Output:
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO - Installing package into ‘/usr/local/lib/R/site-library’
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO - (as ‘lib’ is unspecified)
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO - Warning in install.packages("httr") :
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO - 'lib = "/usr/local/lib/R/site-library"' is not writable
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO - Error in install.packages("httr") : unable to install packages
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO - Execution halted
[2023-01-04, 16:57:58 UTC] {subprocess.py:96} INFO - Command exited with return code 1
Upvotes: 0
Views: 754
Reputation: 160607
You are running into errors during compilation that are causing cascading errors. When in trouble building R images, since the docker build process often masks/hides some of the real error messages, I run a container before the installation phase to see what's going on. If we do that, we'll see errors such as:
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libcurl was not found. Try installing:
* deb: libcurl4-openssl-dev (Debian, Ubuntu, etc)
* rpm: libcurl-devel (Fedora, CentOS, RHEL)
If libcurl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libcurl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘curl’
* removing ‘/home/airflow/R/x86_64-pc-linux-gnu-library/4.0/curl’
* installing *source* package ‘openssl’ ...
** package ‘openssl’ successfully unpacked and MD5 sums checked
** using staged installation
Using PKG_CFLAGS=
--------------------------- [ANTICONF] --------------------------------
Configuration failed because openssl was not found. Try installing:
* deb: libssl-dev (Debian, Ubuntu, etc)
* rpm: openssl-devel (Fedora, CentOS, RHEL)
* csw: libssl_dev (Solaris)
* brew: openssl (Mac OSX)
If openssl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a openssl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
-------------------------- [ERROR MESSAGE] ---------------------------
tools/version.c:1:10: fatal error: openssl/opensslv.h: No such file or directory
1 | #include <openssl/opensslv.h>
| ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
--------------------------------------------------------------------
ERROR: configuration failed for package ‘openssl’
* removing ‘/home/airflow/R/x86_64-pc-linux-gnu-library/4.0/openssl’
ERROR: dependencies ‘curl’, ‘openssl’ are not available for package ‘httr’
* removing ‘/home/airflow/R/x86_64-pc-linux-gnu-library/4.0/httr’
The downloaded source packages are in
‘/tmp/Rtmp9dyEo5/downloaded_packages’
When you see this, it should be clear: you have OS-level dependencies that are not met. While the CRAN page for httr
is uninformative here, know that it relies on curl
, which lists:
SystemRequirements: libcurl: libcurl-devel (rpm) or libcurl4-openssl-dev (deb).
If you dive a little deeper, you'll see that we need to add that package as well as libssl-dev
.
I only tested against the initial few portions of your Dockerfile
, but I was able to build this and confirm that R sees these two new packages. (I also took the liberty of installing both packages in one RUN
line, thinking that fewer images in the build process was a good thing.)
ARG AIRFLOW_VERSION=2.3.3
ARG PYTHON_RUNTIME_VERSION=3.8
FROM apache/airflow:${AIRFLOW_VERSION}-python${PYTHON_RUNTIME_VERSION}
SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
USER 0
RUN apt-get update && apt-get install -y r-base r-base-core r-base-dev \
libcurl4-openssl-dev libssl-dev && \
rm -r /var/lib/apt/lists/*
RUN R -e "install.packages(c('httr', 'jsonlite'), repos='https://cran.rstudio.com/')"
The build:
$ docker build -t myimage .
[+] Building 20.5s (7/7) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 474B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/apache/airflow:2.3.3-python3.8 0.7s
=> [1/3] FROM docker.io/apache/airflow:2.3.3-python3.8@sha256:3a17e765ce209eb6cc551518f3b7ad5e2126d509ca8bdd35232ed2d35f801049 0.0s
=> CACHED [2/3] RUN apt-get update && apt-get install -y r-base r-base-core r-base-dev libcurl4-openssl-dev libssl-dev && rm -r /var/lib/apt/lists/* 0.0s
=> [3/3] RUN R -e "install.packages(c('httr', 'jsonlite'), repos='https://cran.rstudio.com/')" 19.6s
=> exporting to image 0.2s
=> => exporting layers 0.1s
=> => writing image sha256:6271d7deb41211a7fa603a086f8cc24d1249d831546e2fec0293d87d14558312 0.0s
=> => naming to docker.io/library/myimage
Confirm:
$ docker run -it --rm myimage bash
The container is run as root user. For security, consider using a regular user account.
root@10f43b1bd4c2:/opt/airflow# ls /usr/local/lib/R/site-library/
R6 askpass curl httr jsonlite mime openssl sys
root@10f43b1bd4c2:/opt/airflow# R
R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> packageVersion("httr")
[1] ‘1.4.4’
> packageVersion("jsonlite")
[1] ‘1.8.4’
> q("no")
Upvotes: 2