Reputation: 93
i am trying to work with Pyspark in IntelliJ but i cannot figure out how to correctly install it/setup the project. I can work with Python in IntelliJ and I can use the pyspark shell but I cannot tell IntelliJ how to find the Spark files (import pyspark results in "ImportError: No module named pyspark").
Any tipps on how to include/import spark so that IntelliJ can work with it are appreciated.
Thanks.
UPDATE:
I tried this piece of code:
from pyspark import SparkContext, SparkConf
spark_conf = SparkConf().setAppName("scavenge some logs")
spark_context = SparkContext(conf=spark_conf)
address = "C:\test.txt"
log = spark_context.textFile(address)
my_result = log.filter(lambda x: 'foo' in x).saveAsTextFile('C:\my_result')
with the following error messages:
Traceback (most recent call last):
File "C:/Users/U546816/IdeaProjects/sparktestC/.idea/sparktestfile", line 2, in <module>
spark_conf = SparkConf().setAppName("scavenge some logs")
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\conf.py", line 97, in __init__
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\context.py", line 221, in _ensure_initialized
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\java_gateway.py", line 35, in launch_gateway
File "C:\Python27\lib\os.py", line 425, in __getitem__
return self.data[key.upper()]
KeyError: 'SPARK_HOME'
Process finished with exit code 1
Upvotes: 8
Views: 17800
Reputation: 31
1 problem I encountered was space as in 'Program Files\spark' for environment variables SPARK_HOME and PYTHONPATH (as stated above) so I moved spark binaries to my user directory instead. Thanks to this answer.
Also, make sure you installed the packages for the environment.
Ensure you see pyspark package in Project Structure -> Platform Settings SDK -> Python SDK (of choice) -> Packages.
Upvotes: 0
Reputation: 64
Set the env path for (SPARK_HOME
and PYTHONPATH
) in your program run/debug
configuration.
For instance:
SPARK_HOME=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/
PYTHON_PATH=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/pyspark
See attached snapshot in IntelliJ Idea
Upvotes: 4
Reputation: 1405
For example, something of this kind:
from pyspark import SparkContext, SparkConf
spark_conf = SparkConf().setAppName("scavenge some logs")
spark_context = SparkContext(conf=spark_conf)
address = "/path/to/the/log/on/hdfs/*.gz"
log = spark_context.textFile(address)
my_result = (log.
...here go your actions and transformations...
).saveAsTextFile('my_result')
Upvotes: 1