Reputation: 101
Problem summary: Failure to query data via AWS Athena on a Delta Lake table (in S3). I believe the problem happens specifically if the account has Lake Formation enabled.
Steps to replicate:
CREATE EXTERNAL TABLE `superstore_delta`(
`Category` string,
`SubCategory` string,
`Sales` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://$bucket/$prefix/_symlink_format_manifest/'
Permission denied on S3 path: s3a://$bucket/superstore_delta/part-00000-81186b2b-ee07-4543-ab15-8c8cfce2ed0d-c000.snappy.parquet
Anyone else faced this issue while working with Delta Lake tables + Lake Formation?
P.S. The querying works if I use a completely unencrypted S3 bucket, even with Lake Formation enabled
Upvotes: 1
Views: 2921
Reputation: 1
Based on the latest news, it seems that lake formation now support Table, column, row, and cell-level permissions for symlink tables.https://docs.aws.amazon.com/lake-formation/latest/dg/athena-lf.html
Upvotes: 0
Reputation: 324
Most likely IAM role associated with Amazon S3 path (the one you are specifying in AWS Lake formation -> Data Lake location) was not explicitly added to S3 bucket resource policy with S3 Read permissions. So this should be not the role you are querying with in Athena but the role associated with S3 location in Lake formation.
Upvotes: 0