Reputation: 225
I am currently using SparkSession and was told that SparkContext is within SparkSession. However, when doing up the code, it is showing me an error that SparkContext does not exist in SparkSession
Below is the code that i have done
import findspark
findspark.init()
from pyspark.sql import SparkSession, Row
import collections
spark = SparkSession.builder.config("spark.sql.warehouse.dir", "file://C:/temp").appName("SparkSQL").getOrCreate()
lines = spark.textFile('C:/Users/file.xslx')
The error is as follow:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_59944/722806425.py in <module>
----> 1 lines = spark.textFile('C:/Users/samue/bt4221_spark/exercise/week5/customer-orders.xslx')
AttributeError: 'SparkSession' object has no attribute 'textFile'
My current version of findspark: 1.4.2 pyspark: 3.0.3
I dont think its related to any version issue. Any help is greatly appreciated! :)
Upvotes: 2
Views: 3390
Reputation: 5487
textFile
is present in SparkContext
class not in SparkSession
.
spark.sparkContext.textFile('filepath')
Upvotes: 7