Reputation: 750
I need to access an Azure SQL Database from an R notebook in Databricks. To do this I aimed to use the odbc package, which installed fine on my local instance of R.
I have tried to install the package to the cluster using Databricks' interface, which always fails. I have also tried the following code within a notebook:
install.packages("odbc")
which results in:
Installing package into ‘/databricks/spark/R/lib’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/odbc_1.1.6.tar.gz'
Content type 'application/x-gzip' length 288033 bytes (281 KB)
==================================================
downloaded 281 KB
* installing *source* package ‘odbc’ ...
** package ‘odbc’ successfully unpacked and MD5 sums checked
PKG_CFLAGS=
PKG_LIBS=-lodbc
<stdin>:1:17: fatal error: sql.h: No such file or directory
compilation terminated.
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because odbc was not found. Try installing:
* deb: unixodbc-dev (Debian, Ubuntu, etc)
* rpm: unixODBC-devel (Fedora, CentOS, RHEL)
* csw: unixodbc_dev (Solaris)
* brew: unixodbc (Mac OSX)
To use a custom odbc set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘odbc’
* removing ‘/databricks/spark/R/lib/odbc’
The downloaded source packages are in
‘/tmp/RtmpqHp2QM/downloaded_packages’
I have also tried installing from github:
library(devtools)
devtools::install_github("r-dbi/odbc")
Which gives a different error:
Downloading GitHub repo r-dbi/odbc@master
Installing 3 packages: assertthat, BH, Rcpp
Installing packages into ‘/databricks/spark/R/lib’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/assertthat_0.2.1.tar.gz'
Content type 'application/x-gzip' length 12742 bytes (12 KB)
==================================================
downloaded 12 KB
trying URL 'https://cloud.r-project.org/src/contrib/BH_1.69.0-1.tar.gz'
Content type 'application/x-gzip' length 12378154 bytes (11.8 MB)
==================================================
downloaded 11.8 MB
trying URL 'https://cloud.r-project.org/src/contrib/Rcpp_1.0.1.tar.gz'
Content type 'application/x-gzip' length 3661123 bytes (3.5 MB)
==================================================
downloaded 3.5 MB
* installing *source* package ‘assertthat’ ...
** package ‘assertthat’ successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (assertthat)
* installing *source* package ‘BH’ ...
** package ‘BH’ successfully unpacked and MD5 sums checked
** inst
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (BH)
* installing *source* package ‘Rcpp’ ...
** package ‘Rcpp’ successfully unpacked and MD5 sums checked
** libs
g++ -I/usr/share/R/include -DNDEBUG -I../inst/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c Date.cpp -o Date.o
g++ -I/usr/share/R/include -DNDEBUG -I../inst/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c Module.cpp -o Module.o
g++ -I/usr/share/R/include -DNDEBUG -I../inst/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c Rcpp_init.cpp -o Rcpp_init.o
g++ -I/usr/share/R/include -DNDEBUG -I../inst/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c api.cpp -o api.o
g++ -I/usr/share/R/include -DNDEBUG -I../inst/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c attributes.cpp -o attributes.o
g++ -I/usr/share/R/include -DNDEBUG -I../inst/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c barrier.cpp -o barrier.o
g++ -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o Rcpp.so Date.o Module.o Rcpp_init.o api.o attributes.o barrier.o -L/usr/lib/R/lib -lR
installing to /databricks/spark/R/lib/Rcpp/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (Rcpp)
The downloaded source packages are in
‘/tmp/RtmpqHp2QM/downloaded_packages’
Error in processx::run(bin, args = real_cmdargs, stdout_line_callback = real_callback(stdout), :
System command error
In addition: Warning messages:
1: In install.packages("odbc") :
installation of package ‘odbc’ had non-zero exit status
2: In install.packages("odbc") :
installation of package ‘odbc’ had non-zero exit status
Any idea why this package will not install on Databricks when it works fine locally and when all other packages I have tried install on Databricks work fine using the same syntax?
Upvotes: 3
Views: 4513
Reputation: 3182
Best option to access a SQL Database is using the preinstalled JDBC connectivity (see the Documentation). If you want to use ODBC this requires (as mentioned in one of the comments) unix odbc. Good practice to install multiple packages is using init-scripts. Following python code is for creating an init-script for pyodbc installation.
script = """
sudo apt-get -q -y install unixodbc unixodbc-dev
sudo apt-get -q -y install python3-dev
sudo pip install pyodbc
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
sudo curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql
"""
dbutils.fs.put("/databricks/init/pyodbc/pyodbc.sh", script, True)
Hope this helps.
Upvotes: 2