Reputation: 2611
I have a instance which needs to read data from two different account s3.
I have console access to both account, so now I need to configure bucket policy to allow instances to read s3 data from buckets dataaccountlogs and userlogs , and my instance is running in UserAccount .
I need to access these two bucket both from command line as well as using spark job.
Upvotes: 0
Views: 904
Reputation: 443
You will need a role in UserAccount, which will be used to access mentioned buckets, say RoleA. Role should have permissions for required S3 operations.
Then you will able to configure a bucket policy for each bucket:
For DataAccount:
{
"Version": "2012-10-17",
"Id": "Policy1",
"Statement": [
{
"Sid": "test1",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::DataAccount:role/RoleA"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::dataaccountlogs",
"arn:aws:s3:::dataaccountlogs/*"
]
}
]
}
For UserAccount:
{
"Version": "2012-10-17",
"Id": "Policy1",
"Statement": [
{
"Sid": "test1",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::DataAccount:role/RoleA"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::userlogs",
"arn:aws:s3:::userlogs/*"
]
}
]
}
For accessing them from command line:
You will need to setup AWS CLI tool first: https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html
Then you will need to configure a profile for using your role. First you will need to make a profile for your user to login:
aws configure --profile YourProfileAlias
And follow instructions for setting up credentials.
Then you will need to edit config and add profile for a role: ~/.aws/config
Add to the end a block:
[profile YourRoleProfileName]
role_arn = arn:aws:iam::DataAccount:role/RoleA
source_profile = YourProfileAlias
After that you will be able to use aws s3api ... --profile YourRoleProfileName to access your both buckets on behalf of created role.
To access from spark:
Note: you should strictly use s3 protocol for this, not s3a. Also there is number of limitations, you can find here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-s3-optimized-committer.html
spark.hadoop.fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider,org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider"
And configuring you role to be used
spark.hadoop.fs.s3a.assumed.role.arn = arn:aws:iam::DataAccount:role/RoleA
This way is more general now, since EMR commiter have various limitations. You can find more information for configuring this at Hadoop docs: https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/assumed_roles.html
Upvotes: 1