CKre
CKre

Reputation: 191

spark.conf.set with SparkR

I have a Databricks cluster running on Azure and want read / write data from Azure Data Lake Storage using SparkR / sparklyr. Therefore I configured the two resources.

Now I have to provide the Spark environment the necessary configurations to authenticate against the Data Lake Storage.

Setting the configs using the PySpark API works:

    spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
    spark.conf.set("dfs.adls.oauth2.client.id", "****")
    spark.conf.set("dfs.adls.oauth2.credential", "****")
    spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/****/oauth2/token")

In the end SparkR / sparklyr should be used. Here I couldn't figure out where to set the spark.conf.set. I would have guessed something like:

    sparkR.session(
    sparkConfig = list(spark.driver.memory = "2g",
    spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential"),
    spark.conf.set("dfs.adls.oauth2.client.id", "****"),
    spark.conf.set("dfs.adls.oauth2.credential", "****"),
    spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/****/oauth2/token")
    ))

Would be awesome if one of the experts using the SparkR API could help me out here. Thanks!

EDIT: The answer by user10791349 is correct and it works. Another solution is mounting the external data source which is best practice. This is currently only possible using Scala or Python but the mounted data source is afterwards available using the SparkR API.

Upvotes: 2

Views: 1858

Answers (1)

user10791349
user10791349

Reputation: 46

sparkConfig should be

named list of Spark configuration to set on worker nodes.

So the right format is

sparkR.session(
  ... # All other options
  sparkConfig = list(
    spark.driver.memory = "2g",
    dfs.adls.oauth2.access.token.provider.type = "ClientCredential",
    dfs.adls.oauth2.client.id = "****",
    dfs.adls.oauth2.credential = "****",
    dfs.adls.oauth2.refresh.url ="https://login.microsoftonline.com/****/oauth2/token"
  )
)

Remember that the many configuration will be recognized only if there is no active session.

Upvotes: 3

Related Questions