Reputation: 5178
I am following the instructions found here on rbloggers to set up spark on a redhat machine. I want to use Spark in RStudio.
I have downloaded spark-1.6.1-bin-hadoop2.6
and followed the instructions as and put the following line in a script in RStudio:
# Setting SPARK_HOME
Sys.setenv(SPARK_HOME = "~/Downloads/spark-1.6.1-bin-hadoop2.6")
# Setting library path
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
# create a spark context
sc <- sparkR.init(master = "local")
But the last line returns the following error:
Launching java with spark-submit command ~/Downloads/spark-1.6.1-bin-hadoop2.6/bin/spark-submit sparkr-shell /tmp/RtmpSwsYUW/backend_port3752546940e6
sh: ~/Downloads/spark-1.6.1-bin-hadoop2.6/bin/spark-submit: No such file or directory
I have tried every solution on the internet before asking this. For example:
JAVA_HOME and SPARK_HOME are set.
Giving spark-submit
executable by chmod a+x spark-submit.cmd
(and also chmod u+w spark-submit.cmd
) and did not work. (of course I was in the correct library)
Tried spark-shell
in terminal and it works (returns a working shell ins scala)
Adding this before initialization:
Sys.setenv("SPARK_SUBMIT_ARGS"=" - - master yarn-client sparkr-shell")
The only issue I can think of now, is that there is no sparkr-shell
in the directory. It is just sparkr.cmd
and sparkr2.cmd
. Now I am wondering is it related to spark version that I downloaded? Should I install hadoop first?
Upvotes: 0
Views: 1933
Reputation: 7396
SparkR
invokes Spark through system2
, which quotes the command using shQuote
(see ?system2
and ?shQuote
). This means that the ~
doesn't get expanded.
Just specify the full path:
Sys.setenv(SPARK_HOME = "/home/<youruser>/Downloads/spark-1.6.1-bin-hadoop2.6")
Or do the path expansion yourself:
Sys.setenv(SPARK_HOME = path.expand("~/Downloads/spark-1.6.1-bin-hadoop2.6"))
The .cmd
files are for Windows, by the way, so they're not relevant.
Upvotes: 0