Reputation: 2209
I've just set up an AWS Glue crawler to crawl an S3 bucket. I've set up an IAM Role for the crawler and attached the managed policies "AWSGlueServiceRole" and "AmazonS3FullAccess" to the Role. I've ensured that the crawler is using the role. However, every time I run the crawler I get an error message similar to this in the logs:
ERROR : Error Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: <omitted>; S3 Extended Request ID: <omitted>) retrieving file at s3://my-bucket/snapshots/snapshot-1/mydb/mydb.mytable/11/part-00000-ffffffff-ffff-ffff-ffff-ffffffffffff-c000.gz.parquet. Tables created did not infer schemas from this file.
I've confirmed that a Lambda with "AmazonS3ReadOnlyAccess" attached to its execution role is able to access the bucket. What am I doing wrong?
EDIT: Setting "block all public access" or disabling same has no appreciable effect.
EDIT2: The managed policy documents for the IAM Role are as follows. There are no inline policies.
AWSGlueServiceRole:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:*",
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListAllMyBuckets",
"s3:GetBucketAcl",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeRouteTables",
"ec2:CreateNetworkInterface",
"ec2:DeleteNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcAttribute",
"iam:ListRolePolicies",
"iam:GetRole",
"iam:GetRolePolicy",
"cloudwatch:PutMetricData"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket"
],
"Resource": [
"arn:aws:s3:::aws-glue-*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::aws-glue-*/*",
"arn:aws:s3:::*/*aws-glue-*/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::crawler-public*",
"arn:aws:s3:::aws-glue-*"
]
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:*:*:/aws-glue/*"
]
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateTags",
"ec2:DeleteTags"
],
"Condition": {
"ForAllValues:StringEquals": {
"aws:TagKeys": [
"aws-glue-service-resource"
]
}
},
"Resource": [
"arn:aws:ec2:*:*:network-interface/*",
"arn:aws:ec2:*:*:security-group/*",
"arn:aws:ec2:*:*:instance/*"
]
}
]
}
AmazonS3FullAccess:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": "*"
}
]
}
Upvotes: 2
Views: 4770
Reputation: 2209
Turns out the problem was KMS. The bucket contained an export of an Aurora RDS snapshot, and the snapshot was apparently written encrypted. So once I added the following policy, I was set:
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": [
"arn:aws:kms:<region>:<my account id>:key/<my key id>"
]
}
}
Here is my entire managed policy attached to the role (note that the role also has AWSGlueServiceRole
attached):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::my-bucket/snapshots*"
]
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": [
"arn:aws:kms:<region>:<my account id>:key/<my key id>"
]
}
]
}
Upvotes: 5