Reputation: 640
I would like to read a file from S3 in my EMR Hadoop job. I am using the Custom JAR option.
I have tried two solutions:
org.apache.hadoop.fs.S3FileSystem
: throws a NullPointerException
.com.amazonaws.services.s3.AmazonS3Client
: throws an exception, saying "Access denied".What I fail to grasp is that I am starting the job from the Console, so obviously I should have the necessary permissions. However, the AWS_*_KEY keys are missing from the environment variables (System.getenv()
) that are available to the mapper.
I am sure I do something wrong, just not sure what.
Upvotes: 6
Views: 13466
Reputation: 41
Probably a little bit late, but...
Use InstanceProfileCredentialsProvider
for AmazonS3Client.
Upvotes: 4
Reputation: 212
I think that your EMR cluster need to have access to S3, you can create an IAM role for your EMR cluster and give it access to S3. check this link : http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles.html
Upvotes: 2
Reputation: 31515
I think the syntax is
hadoop jar your.jar com.your.main.Class -Dfs.s3n.awsAccessKeyId=<access-id> -Dfs.s3n.awsSecretAccessKey=<secrect-key>
Then the path to the common prefix you wish to read should be of the form
s3n://bucket-name/common/prefix/path
Upvotes: 0