Nole
Nole

Reputation: 119

Submitting Python Application with Apache Spark Submit

I am trying to follow the examples on the Apache Spark documentation site: https://spark.apache.org/docs/2.0.0-preview/submitting-applications.html

I started a Spark standalone cluster and want to run the example Python application. I am in my spark-2.0.0-bin-hadoop2.7 directory and ran the following command

./bin/spark-submit \
--master spark://207.184.161.138:7077 \
examples/src/main/python/pi.py \
1000

However, I get the error

jupyter: '/Users/MyName/spark-2.0.0-bin- \
hadoop2.7/examples/src/main/python/pi.py' is not a Jupyter command

This is what my bash_profile looks like

#setting path for Spark
export SPARK_PATH=~/spark-2.0.0-bin-hadoop2.7
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
alias snotebook='$SPARK_PATH/bin/pyspark --master local[2]'

What am I doing wrong?

Upvotes: 0

Views: 1763

Answers (2)

Lei Feng
Lei Feng

Reputation: 11

Add PYSPARK_DRIVER_PYTHON=ipython before the spark-submit command.

Example:

PYSPARK_DRIVER_PYTHON=ipython ./bin/spark-submit \ 
/home/SimpleApp.py

Upvotes: 1

AbdealiLoKo
AbdealiLoKo

Reputation: 3357

The PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS are meant for running the ipython/jupyter shell when opening the pyspark shell ( More info at How to load IPython shell with PySpark ).

You can set this up like:

alias snotebook='PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook $SPARK_PATH/bin/pyspark --master local[2]'

So that it doesn't interfere with pyspark when submitting

Upvotes: 1

Related Questions