Reputation: 339
We see that,
Spark context available as 'sc'.
Spark session available as 'spark'.
I read spark session includes spark context, streaming context, hive context ... If so, then why are we not able to create an rdd by using a spark session instead of a spark context.
scala> val a = sc.textFile("Sample.txt")
17/02/17 16:16:14 WARN util.SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
a: org.apache.spark.rdd.RDD[String] = Sample.txt MapPartitionsRDD[1] at textFile at <console>:24
scala> val a = spark.textFile("Sample.txt")
<console>:23: error: value textFile is not a member of org.apache.spark.sql.SparkSession
val a = spark.textFile("Sample.txt")
As shown above, sc.textFile
succeeds in creating an RDD but not spark.textFile
.
Upvotes: 12
Views: 10583
Reputation: 304
It can be created in the following way-
val a = spark.read.text("wc.txt") This will create a dataframe,If you want to convert it to RDD then use- a.rdd Please refer the link below,on dataset API- http://cdn2.hubspot.net/hubfs/438089/notebooks/spark2.0/Dataset.html
Upvotes: 1
Reputation: 3373
In Spark 2+, Spark Context is available via Spark Session, so all you need to do is:
spark.sparkContext().textFile(yourFileOrURL)
see the documentation on this access method here.
Note that in PySpark this would become:
spark.sparkContext.textFile(yourFileOrURL)
see the documentation here.
Upvotes: 10
Reputation: 4623
In earlier versions of spark, spark context was entry point for Spark. As RDD was main API, it was created and manipulated using context API’s.
For every other API,we needed to use different contexts.For streaming, we needed StreamingContext
, for SQL sqlContext
and for hive HiveContext
.
But as DataSet and Dataframe API’s
are becoming new standard API’s Spark need an entry point build for them. So in Spark 2.0, Spark have a new entry point for DataSet and Dataframe API’s
called as Spark Session.
SparkSession
is essentially combination of SQLContext, HiveContext and future StreamingContext
.
All the API’s available on those contexts are available on spark session also. Spark session internally has a spark context for actual computation.
sparkContext still contains the method which it had in previous version .
methods of sparkSession can be found here
Upvotes: 8