Reputation: 905
I am new to Pyspark and nothing seems to be working out. Please rescue. I want to read a parquet file with Pyspark. I wrote the following codes.
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
sqlContext.read.parquet("my_file.parquet")
I got the following error
Py4JJavaError Traceback (most recent call last) /usr/local/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e:
/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 318 "An error occurred while calling {0}{1}{2}.\n". --> 319 format(target_id, ".", name), value) 320 else:
then I tried the following codes
from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
SQLContext.read.parquet("my_file.parquet")
Then the error was as follows :
AttributeError: 'property' object has no attribute 'parquet'
Upvotes: 1
Views: 15106
Reputation: 2112
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
sc.stop()
conf = (conf.setMaster('local[*]'))
sc = SparkContext(conf = conf)
sqlContext = SQLContext(sc)
df = sqlContext.read.parquet("my_file.parquet")
Try this.
Upvotes: -1
Reputation: 665
You need to create an instance of SQLContext first.
This will work from pyspark shell:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
sqlContext.read.parquet("my_file.parquet")
If you are using spark-submit you need to create the SparkContext in which case you would do this:
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext()
sqlContext = SQLContext(sc)
sqlContext.read.parquet("my_file.parquet")
Upvotes: 3