Reputation: 25366
I have done quite some spark job in Java/Scala, where I can run some test spark job directly from main() program, as long as I add the required spark jar in the maven pom.xml.
Now I am starting to work with pyspark. I am wondering if I could do something similar? For example, I am using pycharm to run a the wordCount job:
If I just run the main() program, I got the following error:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/profiler/run_profiler.py", line 145, in <module>
profiler.run(file)
File "/Applications/PyCharm.app/Contents/helpers/profiler/run_profiler.py", line 84, in run
pydev_imports.execfile(file, globals, globals) # execute the script
File "/Users/edamame/PycharmProjects/myWordCount/myWordCount.py", line 6, in <module>
from pyspark import SparkContext
ImportError: No module named pyspark
Process finished with exit code 1
I am wondering how do I import pyspark here? so I could run some test job from the main() program like I did in Java/Scala.
I also tried to edit the interpreter path:
and my screenshot from Run -> Edit Configuration:
Last is my project structure screen shot:
Did I miss anything here? Thanks!
Upvotes: 2
Views: 12405
Reputation: 449
I added the py4j-x.x.x-src.zip and pyspark.zip under $SPARK_HOME/python/lib to the project structure (preferences > Project> Project Structure and then do "+ Add Content Root") and it worked fine.
PS: Pycharm already had $PYTHONPATH and $SPARK_HOME read from the os env, which was set in .bashrc/.bash_profile
Upvotes: 1
Reputation: 25366
I finally got it work following the steps in this post. It is really helpful!
https://medium.com/data-science-cafe/pycharm-and-apache-spark-on-mac-os-x-990af6dc6f38#.jk5hl4kz0
Upvotes: 2