Reputation: 4572
I'm very new to the concepts of Big Data and related areas, sorry if I've made some mistake or typo.
I would like to understand Apache Spark and use it only in my computer, in a development / test environment. As Hadoop include HDFS (Hadoop Distributed File System) and other softwares that only matters to distributed systems, can I discard that? If so, where can I download a version of Spark that doesn't need Hadoop? Here I can find only Hadoop dependent versions.
There's reason to use Hadoop or any other distributed file system for Spark if I will run it on my computer for testing purposes?
Note that "Can apache spark run without hadoop?" is a different question from mine, because I do want run Spark in a development environment.
Upvotes: 13
Views: 13690
Reputation: 4721
Yes you can install Spark without Hadoop. Go through Spark official documentation :http://spark.apache.org/docs/latest/spark-standalone.html
Rough steps :
Spark(without Hadoop) - Available on Spark Download page URL : https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
If this url do not work then try to get it from Spark download page
Upvotes: 14
Reputation: 1734
This is not a proper answer to original question. Sorry, It is my fault.
If someone want to run spark without hadoop
distribution tar.gz
.
there should be environment variable to set. this spark-env.sh
worked for me.
#!/bin/sh
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
Upvotes: 0