Michal
Michal

Reputation: 1895

Installing PySpark

I am trying to install PySpark and following the instructions and running this from the command line on the cluster node where I have Spark installed:

$ sbt/sbt assembly

This produces the following error:

-bash: sbt/sbt: No such file or directory

I try the next command:

$ ./bin/pyspark

I get this error:

-bash: ./bin/pyspark: No such file or directory

I feel like I'm missing something basic. What is missing? I have spark installed and am able to access it using the command:

$ spark-shell

I have python on the node and am able to open python using the command:

$ python

Upvotes: 10

Views: 18749

Answers (2)

Jon
Jon

Reputation: 2567

SBT is used to build a Scala project. If you're new to Scala/SBT/Spark, you're doing things the difficult way.

The easiest way to "install" Spark is to simply download Spark (I recommend Spark 1.6.1 -- personal preference). Then unzip the file in the directory you want to have Spark "installed" in, say C:/spark-folder (Windows) or /home/usr/local/spark-folder (Ubuntu).

After you install it in your desired directory, you need to set your environment variables. Depending on your OS, this will depend; this step is, however, not necessary to run Spark (i.e. pyspark).

If you do not set your environment variables, or don't know how to, an alternative is to simply to go your directory on a terminal window, cd C:/spark-folder (Windows) or cd /home/usr/local/spark-folder (Ubuntu) then type

./bin/pyspark

and spark should run.

Upvotes: 2

Josh Rosen
Josh Rosen

Reputation: 13801

What's your current working directory? The sbt/sbt and ./bin/pyspark commands are relative to the directory containing Spark's code ($SPARK_HOME), so you should be in that directory when running those commands.

Note that Spark offers pre-built binary distributions that are compatible with many common Hadoop distributions; this may be an easier option if you're using one of those distros.

Also, it looks like you linked to the Spark 0.9.0 documentation; if you're building Spark from scratch, I recommend following the latest version of the documentation.

Upvotes: 8

Related Questions