Load tables into SQL Server from SparklyR

Question

As of October 2017, is it possible to write Sparklyr Spark Data Frames into SQL Server?

I got this error:

> DBI::dbWriteTable(con, "DZ_test", for_test)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘dbWriteTable’ for signature 
‘"Microsoft SQL Server", "character", "tbl_spark"’

I'm currently running:

dplyr 0.7.4
sparklyr 0.6.3
Spark 2.0.1 built for Hadoop 2.7.3

There is something similar for Cassandra in the wonderful crassy package, but not sure if there is a connector or method for SQL Server. Thanks!

SOLVED!!!

I used the jdbc driver for SQL server. I had jdk 1.8, so that meant I used the driver for Java 8 on all nodes:

spark.jars.packages com.microsoft.sqlserver:mssql-jdbc:6.2.2.jre8

I put this in $SPARK_HOME/conf/spark-defaults.conf.

I typically log into the SQL server using kerberos authentication, but since that was not set up on all nodes of my test cluster I made a SQL login for the database of interest (required an administrator) and connected via that username/pass.

spark_write_jdbc(my_sdf, "my_sdf", 
                 options = list(
                   url=paste0("jdbc:sqlserver://cwjensql10.cwjwin.local;",
                              "databaseName=HER_NILM;",
                              "user=HER;",
                              "password=Test1;"),
                   driver="com.microsoft.sqlserver.jdbc.SQLServerDriver")
)

If you want to use Kerberos authentication (and are running on linux) it would look as such:

spark_write_jdbc(my_sdf, "my_sdf", 
                 options = list(
                   url=paste0("jdbc:sqlserver://cwjensql10.cwjwin.local;",
                              "databaseName=HER_NILM;",
                              "integratedSecurity=true;",
                              "authenticationScheme=JavaKerberos;"),
                   driver="com.microsoft.sqlserver.jdbc.SQLServerDriver")
)

For Windows, just don't include the authenticationScheme option.

Big thanks to @user6910411!!!!

zero323 · Accepted Answer

You can use sparklyr::spark_write_jdbc function to write data directly from Spark without collecting data in R. To make it work, you'll need an appropriate JDBC driver available on the driver and worker nodes.

Assuming driver is already included using spark.jars.packages, you'll need something similar to:

spark_write_jdbc(
  df, "some_name", 
  options=list(
    url="jdbc:sqlserver://...",
    driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"))

Load tables into SQL Server from SparklyR

Answers (1)

Related Questions