Setting Hadoop Config properties for Spark 2.x SQLContexts

Question

Spark 2.x here. I need to set the following Hadoop configurations so that my SqlContext can talk to S3:

sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "blah1")
sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "blah 2")

However it seems that as of 2.x, SparkContext and SqlContext are two separate objects that are built from a SparkSession:

val sparkContext = SparkSession.builder().appName("myapp").getOrCreate().sparkContext
val sqlContext = SparkSession.builder().appName("myapp").getOrCreate().sqlContext

So how do I set the sparkContext.hadoopConfiguration properties for the SQLContext if the SQLContext is totally separate from the SparkContext?!

Alper t. Turker · Accepted Answer

if the SQLContext is totally separate from the SparkContext?!

Neither SparkSession nor SQLContext are separate from SparkContext. Both are tightly bound to a specific SparkContext instance. Also you shouldn't need SQLContext for anything else than legacy applications, when using Spark 2.x. For everything else SparkSession provides equivalent interface.

Just initialize SparkSession

val spark = SparkSession.builder().appName("myapp").getOrCreate()

and use its context to set Hadoop configuration

spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "blah1")

In your code setting configuration on sparkContext would work as well. Each Spark application can have only one SparkContext and it is reused each time you call SparkSession.builder.getOrCreate, or even if you create newSession.

Setting Hadoop Config properties for Spark 2.x SQLContexts

Answers (1)

Related Questions