Installing PySpark

Question

I am trying to install PySpark and following the instructions and running this from the command line on the cluster node where I have Spark installed:

$ sbt/sbt assembly

This produces the following error:

-bash: sbt/sbt: No such file or directory

I try the next command:

$ ./bin/pyspark

I get this error:

-bash: ./bin/pyspark: No such file or directory

I feel like I'm missing something basic. What is missing? I have spark installed and am able to access it using the command:

$ spark-shell

I have python on the node and am able to open python using the command:

$ python

Josh Rosen · Accepted Answer

What's your current working directory? The sbt/sbt and ./bin/pyspark commands are relative to the directory containing Spark's code ($SPARK_HOME), so you should be in that directory when running those commands.

Note that Spark offers pre-built binary distributions that are compatible with many common Hadoop distributions; this may be an easier option if you're using one of those distros.

Also, it looks like you linked to the Spark 0.9.0 documentation; if you're building Spark from scratch, I recommend following the latest version of the documentation.

Installing PySpark

Answers (2)

Related Questions