SBK
SBK

Reputation: 101

Query Delta Lake Table using Athena using SymlinkTextInputFormat

Problem summary: Failure to query data via AWS Athena on a Delta Lake table (in S3). I believe the problem happens specifically if the account has Lake Formation enabled.

Steps to replicate:

    CREATE EXTERNAL TABLE `superstore_delta`(
      `Category` string, 
      `SubCategory` string, 
      `Sales` string)
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      's3://$bucket/$prefix/_symlink_format_manifest/'
    Permission denied on S3 path: s3a://$bucket/superstore_delta/part-00000-81186b2b-ee07-4543-ab15-8c8cfce2ed0d-c000.snappy.parquet

Anyone else faced this issue while working with Delta Lake tables + Lake Formation?

P.S. The querying works if I use a completely unencrypted S3 bucket, even with Lake Formation enabled

Upvotes: 1

Views: 2921

Answers (2)

Amine Ait el harraj
Amine Ait el harraj

Reputation: 1

Based on the latest news, it seems that lake formation now support Table, column, row, and cell-level permissions for symlink tables.https://docs.aws.amazon.com/lake-formation/latest/dg/athena-lf.html

Upvotes: 0

Viktor
Viktor

Reputation: 324

Most likely IAM role associated with Amazon S3 path (the one you are specifying in AWS Lake formation -> Data Lake location) was not explicitly added to S3 bucket resource policy with S3 Read permissions. So this should be not the role you are querying with in Athena but the role associated with S3 location in Lake formation.

Upvotes: 0

Related Questions