John Lui
John Lui

Reputation: 1464

How do I install Apache spark and get it up and running for R?

So, I am quite new to Hadoop and Apache Spark. I am a beginner trying my hands on it. So, firstly I read about what hadoop and MapReduce basically are, how they came into being, and then what advantages does Apache Spark offers over Hadoop (some being faster processing both in memory and on disk), and multiple libraries to make our lives easier. Now, I am trying to try my hands on Apache Spark. In order to do that, I am assuming I have to install a software named Apache Spark on my machine.

What I did was install Oracle Virtual box. Then I installed vagrant. Now, I know that after doing downloading vagrant, and extracting files and stuff, I have to run the command vagrant up and it will download and install my virtual machine. HOWEVER, I want to use Apache Spark using R. I mean, I don't know Python but I know R. And I read somedays back that Databricks apparently has released support for R. Since, I am new to it, I am assuming, there will be some shell wherein I can type my R commands and computation will take place using Apache Spark.

Hence, I don't know how to proceed. Should I do vagrant up, this will I guess allow me to use apache spark using python shells. Or is that the way forward, and after doing that I will have to install some additional libraries for using R.

Upvotes: 1

Views: 2540

Answers (2)

doe doe
doe doe

Reputation: 33

How do I install Apache spark?

Please go to the https://spark.apache.org/downloads.html

Please select preBuild for Hadoop 2.6 and later as of July 2, 2015

Download and unzip the file

Please use the terminal and go to the download folder and eventually to the unzipped folder

cd Downloads/ cd spark-1.4.0-bin-hadoop2.6

get it up and running for R?

Please check you directory with the following command

ls

And you will begin to see the files of the folder

CHANGES.txt NOTICE README.md bin data ec2 lib sbin LICENSE R RELEASE conf derby.log examples python

Finally, please type the following command in the terminal to use R from spark

./bin/sparkR

Upvotes: 1

Kshitij Kulshrestha
Kshitij Kulshrestha

Reputation: 2072

The package you are talking about is SparkR Actually there are few packages that you can import in you R and can use spark locally in R but if you want to use Spark Standalone cluster then you have to install Spark too. In Spark 1.4.0 R packages have been embedded along with Spark installation and you can use them directly by importing it into R.

This newly release package can be downloaded from this location -

https://spark.apache.org/downloads.html

Now either you can use RStudio or R shell and use these lines for importing R package -

Sys.setenv(SPARK_HOME="/home/hduser/Downloads/FlareGet/Others/spark-1.4.0-bin-hadoop2.6").libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

library(SparkR)

or you can directly run sparkR shell from the bin folder of the downloaded package - go to bin folder and type on command promt

./sparkR

Download package from this location - http://www.webhostingjams.com/mirror/apache/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz

Upvotes: 1

Related Questions