Reputation: 23
I'm trying to use R to make a connection to Azure Blob from where I have some CSV files stored. I need to load them into a data frame and make some transformations to them before I write them back to another Blob container. I'm trying to do this through Databricks so I can ultimately call this notebook from Data Factories and include it in a pipeline.
Databricks gives me a sample notebook in Python, where a connection can be made with the following code:
storage_account_name = "testname"
storage_account_access_key = "..."
file_location = "wasb://[email protected]/testfile.csv"
spark.conf.set(
"fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
storage_account_access_key)
df = spark.read.format('csv').load(file_location, header = True, inferSchema = True)
Is there something similar in R? I can use the SparkR or Sparklyr package in R if it can help me load a file and place it in a Spark dataframe as well.
Upvotes: 2
Views: 3717
Reputation: 12788
For your information, I have been informed that R is not capable of doing the actual mounting. The workaround is to mount using another language like Python and read the file using the library "SparkR" as shown below.
The two most commonly used libraries that provide an R interface to Spark are SparkR and sparklyr. Databricks notebooks and jobs support both packages, although you cannot use functions from both SparkR and sparklyr with the same object.
Mount using Python:
Run R notebook using the library “SparkR”:
Upvotes: 4