Reputation: 1464
So, I am quite new to Hadoop and Apache Spark. I am a beginner trying my hands on it. So, firstly I read about what hadoop and MapReduce basically are, how they came into being, and then what advantages does Apache Spark offers over Hadoop (some being faster processing both in memory and on disk), and multiple libraries to make our lives easier. Now, I am trying to try my hands on Apache Spark. In order to do that, I am assuming I have to install a software named Apache Spark on my machine.
What I did was install Oracle Virtual box. Then I installed vagrant. Now, I know that after doing downloading vagrant, and extracting files and stuff, I have to run the command vagrant up
and it will download and install my virtual machine. HOWEVER, I want to use Apache Spark using R. I mean, I don't know Python but I know R. And I read somedays back that Databricks apparently has released support for R. Since, I am new to it, I am assuming, there will be some shell wherein I can type my R commands and computation will take place using Apache Spark.
Hence, I don't know how to proceed. Should I do vagrant up
, this will I guess allow me to use apache spark using python shells. Or is that the way forward, and after doing that I will have to install some additional libraries for using R.
Upvotes: 1
Views: 2540
Reputation: 33
How do I install Apache spark?
Please go to the https://spark.apache.org/downloads.html
Please select preBuild for Hadoop 2.6 and later as of July 2, 2015
Download and unzip the file
Please use the terminal and go to the download folder and eventually to the unzipped folder
cd Downloads/
cd spark-1.4.0-bin-hadoop2.6
get it up and running for R?
Please check you directory with the following command
ls
And you will begin to see the files of the folder
CHANGES.txt NOTICE README.md bin data ec2 lib sbin
LICENSE R RELEASE conf derby.log examples python
Finally, please type the following command in the terminal to use R from spark
./bin/sparkR
Upvotes: 1
Reputation: 2072
The package you are talking about is SparkR Actually there are few packages that you can import in you R and can use spark locally in R but if you want to use Spark Standalone cluster then you have to install Spark too. In Spark 1.4.0 R packages have been embedded along with Spark installation and you can use them directly by importing it into R.
This newly release package can be downloaded from this location -
https://spark.apache.org/downloads.html
Now either you can use RStudio or R shell and use these lines for importing R package -
Sys.setenv(SPARK_HOME="/home/hduser/Downloads/FlareGet/Others/spark-1.4.0-bin-hadoop2.6").libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
or you can directly run sparkR shell from the bin folder of the downloaded package - go to bin folder and type on command promt
./sparkR
Download package from this location - http://www.webhostingjams.com/mirror/apache/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
Upvotes: 1