nanounanue
nanounanue

Reputation: 8342

Specify the AWS credentials in hadoop

I want to specify the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID at run-time.

I already tried using

hadoop -Dfs.s3a.access.key=${AWS_ACESS_KEY_ID} -Dfs.s3a.secret.key=${AWS_SECRET_ACCESS_KEY} fs -ls s3a://my_bucket/

and

export HADOOP_CLIENT_OPTS="-Dfs.s3a.access.key=${AWS_ACCESS_KEY_ID} -Dfs.s3a.secret.key=${AWS_SECRET_ACCESS_KEY}"

and

export HADOOP_OPTS="-Dfs.s3a.access.key=${AWS_ACCESS_KEY_ID} -Dfs.s3a.secret.key=${AWS_SECRET_ACCESS_KEY}"

In the last two examples, I tried to run with:

hadoop fs -ls s3a://my-bucket/

In all the cases I got:

-ls: Fatal internal error
com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
        at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
        at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
        at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
        at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
        at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)

What am doing wrong?

Upvotes: 1

Views: 6774

Answers (2)

stevel
stevel

Reputation: 13430

I think part of the problem is that, confusingly, unlike the JVM -D opts, the Hadoop -D command expects a space between the -D and the key, e.g:

hadoop fs -ls -D fs.s3a.access.key=AAIIED s3a://landsat-pds/

I would still avoid doing that on the command line though, as anyone who can do a ps command can see your secrets.

Generally we stick them into core-site.xml when running outside EC2; in EC2 it's handled magically

Upvotes: 0

franklinsijo
franklinsijo

Reputation: 18270

This is a correct way to pass the credentials at runtime,

hadoop fs -Dfs.s3a.access.key=${AWS_ACCESS_KEY_ID} -Dfs.s3a.secret.key=${AWS_SECRET_ACCESS_KEY} -ls s3a://my_bucket/

Your syntax needs a small fix. Make sure that empty strings are not passed as the values to these properties. It would make these runtime properties invalid and would go on searching for the credentials as per the authentication chain.

The S3A client follows the following authentication chain:

  1. If login details were provided in the filesystem URI, a warning is printed and then the username and password extracted for the AWS key and secret respectively.
  2. The fs.s3a.access.key and fs.s3a.secret.key are looked for in the Hadoop XML configuration.
  3. The AWS environment variables are then looked for.
  4. An attempt is made to query the Amazon EC2 Instance Metadata Service to retrieve credentials published to EC2 VMs.

The other possible methods to pass the credentials at runtime (please note that it is neither safe nor recommended to supply them during runtime),

1) Embed them in the S3 URI

hdfs dfs -ls s3a://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@my-bucket/

If the secret key contains any + or / symbols, escape them with %2B and %2F respectively.

Never share the URL, logs generated using it, or use such an inline authentication mechanism in production.

2) export environment variables for the session

export AWS_ACCESS_KEY_ID=<YOUR_AWS_ACCESS_KEY_ID>
export AWS_SECRET_ACCESS_KEY=<YOUR_AWS_SECRET_ACCESS_KEY>

hdfs dfs -ls s3a://my-bucket/

Upvotes: 7

Related Questions