Reputation: 2331
When I attempt to use SparkR on a Cloud Dataproc cluster (version 0.2) I get an error like the following:
Exception in thread "main" java.io.FileNotFoundException:
/usr/lib/spark/R/lib/sparkr.zip (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at
org.apache.spark.deploy.RPackageUtils$.zipRLibraries(RPackageUtils.scala:215)
at
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:371)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
How can I fix this so I can use SparkR?
Upvotes: 0
Views: 240
Reputation: 2331
This issue is due to a bug in the Spark 1.5 series (JIRA here). To fix this, run the following command on the master node either by SSHing into the master node or by using an initialization action.
sudo chmod 777 /usr/lib/spark/R/lib
This issue is supposed to be fixed in Spark 1.6 which Cloud Dataproc will eventually support in a new image version in the future.
Upvotes: 4