Reputation: 1045
I want to pass a python test into SparkContext within my jupyter notebook and have the output shown in the notebook as well. To test, I'm simply executing my jupyter notebook like so:
sparkConf = SparkConf()
sc = SparkContext(conf=sparkConf)
sc.addPyFile('test.py')
With test.py looking like
rdd = sc.parallelize(range(100000000))
print(rdd.sum())
But when I execute the sc.addPyFile
line in my notebook, I do not see the output. Am I passing the pyspark script into my SparkContext incorrectly?
Upvotes: 2
Views: 1349
Reputation: 5536
The function you are using is not used to trigger the job instead it pass the python module to the sparkContext so that it can be imported in the script as needed.
See here: https://spark.apache.org/docs/0.7.3/api/pyspark/pyspark.context.SparkContext-class.html#addPyFile
To trigger a job you need to run
spark-submit test.py
outside of your jupyter notebook.
Upvotes: 1