Leyth G
Leyth G

Reputation: 1143

Is it possible to restrict access to S3 data from EMR (zeppelin) by IAM roles?

I have set up an EMR cluster with Zeppelin installed on it. I configured Zeppelin with Active Directory authentication and I have associated those AD users with IAM roles. I was hoping to restrict access to specific resources on S3 after logging into zeppelin using the AD credentials. However, it doesn't seem to be respecting the permissions the IAM role has defined. The EMR role has S3 access so I am wondering if that is overriding the permissions or that is actually the only role it cares about in this scenario

Does anyone have any idea?

Upvotes: 2

Views: 1341

Answers (2)

Andrew
Andrew

Reputation: 41

I'm actually about to try to tackle this problem this week. I will try to post updates as I have some. I know that this is an old post, but I've found so many helpful things on this site that I figured it might help someone else even if doesn't help the original poster.

The question was if anyone has any ideas, and I do have an idea. So even though I'm not sure if it will work yet, I'm still posting my idea as a response to the question.

So far, what I've found isn't ideal for large organizations because it requires some per user modifications on the master node, but I haven't run into any blockers yet for a cluster at the scale that I need it to be. At least nothing that can't be fixed with a few configuration management tool scripts.

The idea is to:

  1. Create a vanilla Amazon EMR cluster
  2. Configure SSL
  3. Configure authentication via Active Directory
  4. (this step is what I am currently on) Configure Zeppelin to use impersonation (i.e. run the actual notebook processes as the authenticated user), which so far seems to require creating a local OS (Linux) user (with a username matching the AD username) for each user that will be authenticating to the Zeppelin UI. Employing one of the impersonation configurations can then cause Zeppelin run the notebooks as that OS user (there are a couple of different impersonation configurations possible).
  5. Once impersonation is working, manually configure my own OS account's ~/.aws/credentials and ~/.aws/config files.
  6. Write a Notebook that will test various access combinations based on different policies that will be temporarily attached to my account.

The idea is to have the Zeppelin notebook processes kick off as the OS user that is named the same as the AD authenticated user, and then have an ~/.aws/credentials and ~/.aws/config file in each users' home directory, hoping that that might cause the connection to S3 to follow the rules that are attached to the AWS account that is associated with the keys in each user's credentials file.

I'm crossing my fingers that this will work, because if it doesn't, my idea for how to potentially accomplish this will become significantly more complex. I'm planning on continuing to work on this problem tomorrow afternoon. I'll try to post an update when I have made some more progress.

Upvotes: 3

Victor
Victor

Reputation: 484

One way to allow access to S3 by IAM user/role is to meet these 2 conditions:

  1. Create S3 bucket policy matching S3 resources with IAM user/role. This should be done in S3/your bucket/Permissions/Bucket Policy. Example:

    {
        "Version": "2012-10-17",
        "Id": "Policy...843",
        "Statement": [
            {
                "Sid": "Stmt...434",
                "Effect": "Allow",
                "Principal": {
                    "AWS": [
                        "arn:aws:iam::<account-id>:user/your-s3-user",
                        "arn:aws:iam::<account-id>:role/your-s3-role"
                    ]
                },
                "Action": "s3:*",
                "Resource": [
                    "arn:aws:s3:::target-bucket/*",
                    "arn:aws:s3:::other-bucket/specific-resource"
                ]
            }
        ]
    }
    
  2. Allow S3 actions for your IAM user/role. This should be done in IAM/Users/your user/Permissions/Add inline policy. Example:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "s3:ListAllMyBuckets",
                    "s3:HeadBucket",
                    "s3:ListObjects"
                ],
                "Resource": "s3:*"
            }
        ]
    }
    

Please note this might be not the only and/or best way, but it worked for me.

Upvotes: 0

Related Questions