James
James

Reputation: 2331

SparkR on Dataproc (Spark 1.5.x) does not work

When I attempt to use SparkR on a Cloud Dataproc cluster (version 0.2) I get an error like the following:

Exception in thread "main" java.io.FileNotFoundException:
/usr/lib/spark/R/lib/sparkr.zip (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at
org.apache.spark.deploy.RPackageUtils$.zipRLibraries(RPackageUtils.scala:215)
at
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:371)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

How can I fix this so I can use SparkR?

Upvotes: 0

Views: 240

Answers (1)

James
James

Reputation: 2331

This issue is due to a bug in the Spark 1.5 series (JIRA here). To fix this, run the following command on the master node either by SSHing into the master node or by using an initialization action.

sudo chmod 777 /usr/lib/spark/R/lib

This issue is supposed to be fixed in Spark 1.6 which Cloud Dataproc will eventually support in a new image version in the future.

Upvotes: 4

Related Questions