Reputation: 997
I am trying to execute a simple Spark SQL code (PySpark) using Spark-Submit but received the below error. Note - I am running this in Spark 2.x.
spark-submit HousePriceSolution.py
Error:
from pyspark.sql import SparkSession ImportError: cannot import name SparkSession
Code:
from pyspark.sql import SparkSession
PRICE_SQ_FT = "Price SQ Ft"
if __name__ == "__main__":
session = SparkSession.builder.appName("HousePriceSolution").getOrCreate()
realEstate = session.read \
.option("header","true") \
.option("inferSchema", value=True) \
.csv("hdfs:............./RealEstate.csv")
realEstate.groupBy("Location") \
.avg(PRICE_SQ_FT) \
.orderBy("avg(Price SQ FT)") \
.show()
session.stop()
Upvotes: 3
Views: 16188
Reputation: 716
Probably the spark-submit
is pointing to another version of spark. Check what version of spark is used by spark-submit
using the following command:
spark-submit --version
If the spark-version is ok, then check what the PYTHONPATH
contains (echo $PYTHONPATH
), because it is posible that PYTHONPATH
has the pyspark library from another version of spark. If PYTHONPATH
doesn't contain the pyspark library, then add to it like this:
export PYTHONPATH=$PYTHONPATH:"$SPARK_HOME/python/lib/*"
Upvotes: 2