Reputation: 615
Spark 2.x here. I need to set the following Hadoop configurations so that my SqlContext
can talk to S3:
sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "blah1")
sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "blah 2")
However it seems that as of 2.x, SparkContext
and SqlContext
are two separate objects that are built from a SparkSession
:
val sparkContext = SparkSession.builder().appName("myapp").getOrCreate().sparkContext
val sqlContext = SparkSession.builder().appName("myapp").getOrCreate().sqlContext
So how do I set the sparkContext.hadoopConfiguration
properties for the SQLContext
if the SQLContext
is totally separate from the SparkContext
?!
Upvotes: 1
Views: 2730
Reputation: 35229
if the SQLContext is totally separate from the SparkContext?!
Neither SparkSession
nor SQLContext
are separate from SparkContext
. Both are tightly bound to a specific SparkContext
instance. Also you shouldn't need SQLContext
for anything else than legacy applications, when using Spark 2.x. For everything else SparkSession
provides equivalent interface.
Just initialize SparkSession
val spark = SparkSession.builder().appName("myapp").getOrCreate()
and use its context to set Hadoop configuration
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "blah1")
In your code setting configuration on sparkContext
would work as well. Each Spark application can have only one SparkContext
and it is reused each time you call SparkSession.builder.getOrCreate
, or even if you create newSession
.
Upvotes: 2