Sai Wai Maung
Sai Wai Maung

Reputation: 1617

Spark: how to not use aws credentials explicitly in Spark application

In my Spark application, I have aws credentials passed in via Command Line arguments.

spark.sparkContext.hadoopConfiguration.set("fs.s3.awsAccessKeyId", awsAccessKeyId)
spark.sparkContext.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", awsSecretAccessKey)
spark.sparkContext.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")

However, in Cluster Mode explicitly passing credential between nodes is huge security issue since these credentials are being passed as text.

How do I make my application to work with IAmRole or other proper approach that doesn't need this two lines of code in Spark app:

spark.sparkContext.hadoopConfiguration.set("fs.s3.awsAccessKeyId", awsAccessKeyId)
spark.sparkContext.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", awsSecretAccessKey)

Upvotes: 1

Views: 2356

Answers (1)

Sandeep Purohit
Sandeep Purohit

Reputation: 3692

You can add following config in core-site.xml of hadoop conf and cannot add it in your code base

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
  <name>fs.s3n.awsAccessKeyId</name>
  <value>my_aws_access_key_id_here</value>
  </property>
  <property>
  <name>fs.s3n.awsSecretAccessKey</name>
  <value>my_aws_secret_access_key_here</value>
  </property>
</configuration>

To use the above file simply export HADOOP_CONF_DIR=~/Private/.aws/hadoop_conf before running spark or conf/spark-env.sh

And for IAM Role there is already bug is open in spark 1.6 https://issues.apache.org/jira/browse/SPARK-16363

Upvotes: 2

Related Questions