Chris Snow
Chris Snow

Reputation: 24626

How to add spark packages to Spark R notebook on DSX?

The spark documentation shows how a spark package can be added:

sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")

I believe this can only be used when initialising the session.

How can we add spark packages for SparkR using a notebook on DSX?

Upvotes: 1

Views: 183

Answers (1)

charles gomes
charles gomes

Reputation: 2155

Please use pixiedust package manager to install the avro package.

pixiedust.installPackage("com.databricks:spark-avro_2.11:3.0.0")

http://datascience.ibm.com/docs/content/analyze-data/Package-Manager.html

Install it from python 1.6 kernel since pixiedust is importable in python.(Remember this is install at your spark instance level). Once you install it , restart the kernel and then switch to R kernel and then read the avro like this:-

df1 <- read.df("episodes.avro", source = "com.databricks.spark.avro", header = "true")

head(df1)

Complete Notebook:-

https://github.com/charles2588/bluemixsparknotebooks/raw/master/R/sparkRPackageTest.ipynb

Thanks, Charles.

Upvotes: 2

Related Questions