Kar
Kar

Reputation: 1014

Airflow: RScript : Error in library(httr) : there is no package called ‘httr’

Requirement: To install R Packages in Airflow to execute the RScript in Airflow

Tried: Below commands in DockerFile

# Global Docker Build arguments
ARG AIRFLOW_VERSION=2.3.3
ARG PYTHON_RUNTIME_VERSION=3.8
FROM apache/airflow:${AIRFLOW_VERSION}-python${PYTHON_RUNTIME_VERSION}
SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
USER 0
RUN apt-get update && apt-get install -y r-base r-base-core && \
    rm -r /var/lib/apt/lists/*
    
RUN R -e "install.packages('httr', repos='https://cran.rstudio.com/')"
RUN R -e "install.packages('jsonlite', repos='http://cran.rstudio.com/')"

Error : Error in library(httr) : there is no package called ‘httr’

Also Tried another by including in the RScript as

install.packages("httr")
install.packages("jsonlite")
library(httr)
library(jsonlite)

Got error as

Running command: ['Rscript', '/opt/airflow/dags/r_scripts/R_script.R']
[2023-01-04, 16:57:58 UTC] {subprocess.py:85} INFO - Output:
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO - Installing package into ‘/usr/local/lib/R/site-library’
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO - (as ‘lib’ is unspecified)
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO - Warning in install.packages("httr") :
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO -   'lib = "/usr/local/lib/R/site-library"' is not writable
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO - Error in install.packages("httr") : unable to install packages
[2023-01-04, 16:57:58 UTC] {subprocess.py:92} INFO - Execution halted
[2023-01-04, 16:57:58 UTC] {subprocess.py:96} INFO - Command exited with return code 1

Upvotes: 0

Views: 754

Answers (1)

r2evans
r2evans

Reputation: 160607

You are running into errors during compilation that are causing cascading errors. When in trouble building R images, since the docker build process often masks/hides some of the real error messages, I run a container before the installation phase to see what's going on. If we do that, we'll see errors such as:

------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libcurl was not found. Try installing:
 * deb: libcurl4-openssl-dev (Debian, Ubuntu, etc)
 * rpm: libcurl-devel (Fedora, CentOS, RHEL)
If libcurl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libcurl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘curl’
* removing ‘/home/airflow/R/x86_64-pc-linux-gnu-library/4.0/curl’
* installing *source* package ‘openssl’ ...
** package ‘openssl’ successfully unpacked and MD5 sums checked
** using staged installation
Using PKG_CFLAGS=
--------------------------- [ANTICONF] --------------------------------
Configuration failed because openssl was not found. Try installing:
 * deb: libssl-dev (Debian, Ubuntu, etc)
 * rpm: openssl-devel (Fedora, CentOS, RHEL)
 * csw: libssl_dev (Solaris)
 * brew: openssl (Mac OSX)
If openssl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a openssl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
-------------------------- [ERROR MESSAGE] ---------------------------
tools/version.c:1:10: fatal error: openssl/opensslv.h: No such file or directory
    1 | #include <openssl/opensslv.h>
      |          ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
--------------------------------------------------------------------
ERROR: configuration failed for package ‘openssl’
* removing ‘/home/airflow/R/x86_64-pc-linux-gnu-library/4.0/openssl’
ERROR: dependencies ‘curl’, ‘openssl’ are not available for package ‘httr’
* removing ‘/home/airflow/R/x86_64-pc-linux-gnu-library/4.0/httr’

The downloaded source packages are in
        ‘/tmp/Rtmp9dyEo5/downloaded_packages’

When you see this, it should be clear: you have OS-level dependencies that are not met. While the CRAN page for httr is uninformative here, know that it relies on curl, which lists:

SystemRequirements: libcurl: libcurl-devel (rpm) or libcurl4-openssl-dev (deb).

If you dive a little deeper, you'll see that we need to add that package as well as libssl-dev.

I only tested against the initial few portions of your Dockerfile, but I was able to build this and confirm that R sees these two new packages. (I also took the liberty of installing both packages in one RUN line, thinking that fewer images in the build process was a good thing.)

ARG AIRFLOW_VERSION=2.3.3
ARG PYTHON_RUNTIME_VERSION=3.8
FROM apache/airflow:${AIRFLOW_VERSION}-python${PYTHON_RUNTIME_VERSION}
SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
USER 0
RUN apt-get update && apt-get install -y r-base r-base-core r-base-dev \
    libcurl4-openssl-dev libssl-dev && \
    rm -r /var/lib/apt/lists/*

RUN R -e "install.packages(c('httr', 'jsonlite'), repos='https://cran.rstudio.com/')"

The build:

$ docker build -t myimage .
[+] Building 20.5s (7/7) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                          0.0s
 => => transferring dockerfile: 474B                                                                                                                                                                          0.0s
 => [internal] load .dockerignore                                                                                                                                                                             0.0s
 => => transferring context: 2B                                                                                                                                                                               0.0s
 => [internal] load metadata for docker.io/apache/airflow:2.3.3-python3.8                                                                                                                                     0.7s
 => [1/3] FROM docker.io/apache/airflow:2.3.3-python3.8@sha256:3a17e765ce209eb6cc551518f3b7ad5e2126d509ca8bdd35232ed2d35f801049                                                                               0.0s
 => CACHED [2/3] RUN apt-get update && apt-get install -y r-base r-base-core r-base-dev     libcurl4-openssl-dev libssl-dev &&     rm -r /var/lib/apt/lists/*                                                 0.0s
 => [3/3] RUN R -e "install.packages(c('httr', 'jsonlite'), repos='https://cran.rstudio.com/')"                                                                                                              19.6s
 => exporting to image                                                                                                                                                                                        0.2s
 => => exporting layers                                                                                                                                                                                       0.1s
 => => writing image sha256:6271d7deb41211a7fa603a086f8cc24d1249d831546e2fec0293d87d14558312                                                                                                                  0.0s
 => => naming to docker.io/library/myimage

Confirm:

$ docker run -it --rm myimage bash
The container is run as root user. For security, consider using a regular user account.

root@10f43b1bd4c2:/opt/airflow# ls /usr/local/lib/R/site-library/
R6  askpass  curl  httr  jsonlite  mime  openssl  sys
root@10f43b1bd4c2:/opt/airflow# R

R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> packageVersion("httr")
[1] ‘1.4.4’
> packageVersion("jsonlite")
[1] ‘1.8.4’
> q("no")

Upvotes: 2

Related Questions