David Nemeskey
David Nemeskey

Reputation: 640

How to read a file from s3 in EMR?

I would like to read a file from S3 in my EMR Hadoop job. I am using the Custom JAR option.

I have tried two solutions:

What I fail to grasp is that I am starting the job from the Console, so obviously I should have the necessary permissions. However, the AWS_*_KEY keys are missing from the environment variables (System.getenv()) that are available to the mapper.

I am sure I do something wrong, just not sure what.

Upvotes: 6

Views: 13466

Answers (3)

Ivan Konyshev
Ivan Konyshev

Reputation: 41

Probably a little bit late, but... Use InstanceProfileCredentialsProvider for AmazonS3Client.

Upvotes: 4

SelimN
SelimN

Reputation: 212

I think that your EMR cluster need to have access to S3, you can create an IAM role for your EMR cluster and give it access to S3. check this link : http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles.html

Upvotes: 2

samthebest
samthebest

Reputation: 31515

I think the syntax is

hadoop jar your.jar com.your.main.Class -Dfs.s3n.awsAccessKeyId=<access-id> -Dfs.s3n.awsSecretAccessKey=<secrect-key>

Then the path to the common prefix you wish to read should be of the form

s3n://bucket-name/common/prefix/path

Upvotes: 0

Related Questions