JMV12
JMV12

Reputation: 1045

Execute python script with spark

I want to pass a python test into SparkContext within my jupyter notebook and have the output shown in the notebook as well. To test, I'm simply executing my jupyter notebook like so:

sparkConf = SparkConf()
sc = SparkContext(conf=sparkConf)

sc.addPyFile('test.py')

With test.py looking like

rdd = sc.parallelize(range(100000000))
print(rdd.sum())

But when I execute the sc.addPyFile line in my notebook, I do not see the output. Am I passing the pyspark script into my SparkContext incorrectly?

Upvotes: 2

Views: 1349

Answers (1)

Shubham Jain
Shubham Jain

Reputation: 5536

The function you are using is not used to trigger the job instead it pass the python module to the sparkContext so that it can be imported in the script as needed.

See here: https://spark.apache.org/docs/0.7.3/api/pyspark/pyspark.context.SparkContext-class.html#addPyFile

To trigger a job you need to run spark-submit test.py outside of your jupyter notebook.

Upvotes: 1

Related Questions