Reputation: 4951
We are using S3 for our data lake which has customerId as the partitionId. Athena is used to query this data lake.
We use fine-grained access control when querying data in DDB and S3 (using SDK).
Is there a way to do it using Athena as well to ensure that the fine-grained access control is imposed at the storage level as well, instead of just filtering based on customerId in memory?
Upvotes: 0
Views: 383
Reputation: 132972
Permissions in Athena (assuming you don't use Lake Formation) are a combination of Athena, Glue, and S3 permissions. The S3 permissions are the most important ones since they govern which data the user has access to.
If your data is partitioned by customer ID it means that each customer's data is in a distinct prefix on S3. When you create IAM permissions for a user you will be able to scope that user's permissions to one or more prefixes.
Here is a fragment of an IAM statement that grants GetObject permission only in a specific prefix.
{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": ["arn:aws:s3:::bucket_name/prefix_including_customer_id/*"]
}
The Resource
value is an array and you can specify multiple prefixes.
You also need to grant s3:ListBucket
permissions. If getting a listing of objects is not sensitive you can grant this for the whole bucket, otherwise you need a slightly different statement to limit list permissions to the same prefixes:
{
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::bucket_name"],
"Condition": {
"StringLike": {
"s3:prefix": ["prefix_including_customer_id/*"]
}
}
}
With a policy containing these types of statements the user will only be allowed to read objects they have access to, and trying to read other objects, for example by running a query like SELECT * FROM customer_data
will result in access denied errors. Only if they run a query that filters on the partition keys that match the S3 prefixes they have access to will queries succeed.
Users will still be able to see all values of a partition key (just not the data within the partitions), that is unavoidable.
Upvotes: 1