stevec
stevec

Reputation: 52678

How to access S3 data from R on EC2 using aws.s3 package functions (write_using and read_using)?

I am trying to read/write to S3 from R in an EC2 using write_using() from the aws.s3 package.

What I've done so far

The EC2 has an IAM attached that allows read/write to a specific AWS S3 bucket

I have associated the IAM with the EC2. I have also created a bucket (in this example please assume it is called my-unique-bucket)

When I go into the EC2 and open R, then run something like s3write_using(mtcars, FUN = write.csv, object = "mtcars.csv", bucket = "my-unique-bucket")

I see

List of 4
 $ Code     : chr "AccessDenied"
 $ Message  : chr "Access Denied"
 $ RequestId: chr "3B942C125C154B49"
 $ HostId   : chr "0dgc4Iuv3EXdQxMgkh4Qkxt+aADzxsVYp6pq2k3/OjSztFlV1nftjn4MkIvNZ+wCVqzeJsttY44="
 - attr(*, "headers")=List of 6
  ..$ x-amz-request-id : chr "3B942C125C154B49"
  ..$ x-amz-id-2       : chr "0dgc4Iuv3EXdQxMgkh4Qkxt+aADzxsVYp6pq2k3/OjSztFlV1nftjn4MkIvNZ+wCVqzeJsttY44="
  ..$ content-type     : chr "application/xml"
  ..$ transfer-encoding: chr "chunked"
  ..$ date             : chr "Tue, 18 Jun 2019 12:57:45 GMT"
  ..$ server           : chr "AmazonS3"
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 - attr(*, "class")= chr "aws_error"
NULL
Error in parse_aws_s3_response(r, Sig, verbose = verbose) : 
  Forbidden (HTTP 403).

I would have expected this to work? Instead, it looks like an authentication issue

Next?

I am not sure what to try next, as I was hoping the above would work.

Given it doesn't, do I need to somehow authenticate the EC2 (or R session running on the EC2)? I would think that authentication using root credentials defeats the purpose of the IAM (since the root user has permissions for everything, whereas the IAM associated with the instance has just read/write to S3 permissions). So I am not sure that's the right thing to do (it seems there would be a better way - i.e. a way that let's S3 know the EC2 has an IAM allowing it access). But I am not sure how to do this or where I have gone wrong in attempting this

Upvotes: 2

Views: 6999

Answers (2)

Sourabh
Sourabh

Reputation: 769

Solution using aws IAM role:

You need to use aws.ec2metadata and make sure to specify the correct s3 bucket region to use ec2 or ecs task IAM role.

library(aws.s3)
library(aws.ec2metadata)
Sys.setenv("AWS_DEFAULT_REGION" = 'us-west-2')

References

There are few things here to look into.

1. Include package aws.ec2metadata

As per https://cran.r-project.org/web/packages/aws.s3/readme/README.html, it's using https://github.com/cloudyr/aws.signature/ for aws auth and you need to use aws.ec2metadata

  1. If R is running on an EC2 instance, the role profile credentials provided by aws.ec2metadata, if the aws.ec2metadata package is installed.

    When using EC2, note that aws.ec2metadata only supports IMDSv1. If your EC2 instance requires IMDSv2, no metadata will be available.

  2. If R is running on an ECS task, the role profile credentials provided by aws.ec2metadata, if the aws.ec2metadata package is installed.

2. Specify correct aws region

  • S3 can be a bit picky about region specifications. bucketlist() will return buckets from all regions, but all other functions require specifying a region. A default of "us-east-1" is relied upon if none is specified explicitly and the correct region can’t be detected automatically. (Note: using an incorrect region is one of the most common - and hardest to figure out - errors when working with S3.)

3. Verify ec2/ecs task IAM role have correct access(read/write) to s3 bucket.

  • This can be verified by aws cli
aws s3 ls s3://my_bucket/directory/
aws s3 cp s3://my_bucket/directory/myfile /tmp/
aws s3 cp /tmp/test s3://my_bucket/directory/ # check this only if you need write access

Upvotes: 5

stevec
stevec

Reputation: 52678

Accessing S3 data from R

This will work from anywhere (e.g. windows, mac, etc, whether running on an EC2 or otherwise). You need this code in R:

key <- "ALIAVI5FAYD9B(8MVJZ" # Substitute with your own (see below)
secret <- "ePy7jMlRj5jTVAruqmb3uap9bHXmnsSHI+zqfdmHL" # (see below)

Sys.setenv("AWS_ACCESS_KEY_ID" = key,
           "AWS_SECRET_ACCESS_KEY" = secret)

bucketlist() # This returns a list of all your buckets if authentication was successful 

To get it working, you need your unique key and secret to sub in. Doing so only takes 1 minute, do the following:

  1. Go to the IAM section in AWS in the browser
  2. Create an IAM user (select 'Programmatic access' for access type)
  3. Give it the predefined 'AmazonS3FullAccess' permission (do this by clicking on 'Attach existing policies directly' and searching for 'AmazonS3FullAccess'). No need for tags or anything else.
  4. Click through and create the user. You'll see the access key and secret in the browser on the last screen.
  5. Put the key and secret in your R code (above) and you're done! That's all there is to it.

Some extra tips

# Write to your S3 bucket
s3write_using(mtcars, FUN = write.csv, object = "mtcars.csv", bucket = "your-bucket-name")

# Read from your S3 bucket
myfile <- s3read_using(FUN = read.csv, object = "mtcars.csv", bucket = "your-bucket-name")

# In case you need to remove an environment variable AWS_SESSION_TOKEN, this will clear it
Sys.unsetenv("AWS_SECRET_ACCESS_KEY")

Upvotes: 0

Related Questions