user3435903
user3435903

Reputation: 133

Is it possible to change the delimiter of AWS athena output file

Here is my sample code where I create a file in S3 bucket using AWS Athena. The file by default is in csv format. Is there a way to change it to pipe delimiter ?

import json
import boto3

def lambda_handler(event, context):
    s3 = boto3.client('s3')
    client = boto3.client('athena')

    # Start Query Execution
    response = client.start_query_execution(
        QueryString="""
                    select * from srvgrp
                    where category_code = 'ACOMNCDU'
                    """,
        QueryExecutionContext={
            'Database': 'tmp_db'
        },
        ResultConfiguration={
            'OutputLocation': 's3://tmp-results/athena/'
        }
    )
    queryId = response['QueryExecutionId']
    print('Query id is :' + str(queryId))
  

Upvotes: 3

Views: 3541

Answers (1)

Ilya Kisil
Ilya Kisil

Reputation: 2658

There is a way to do that with CTAS query.

BUT:

  • This is a hacky way and not what CTAS queries are supposed to be used for, since it will also create a new table definition in AWS Glue Data Catalog.
  • I'm not sure about performance
CREATE TABLE "UNIQU_PREFIX__new_table"
WITH (
    format = 'TEXTFILE',
    external_location = 's3://tmp-results/athena/__SOMETHING_UNIQUE__',
    field_delimiter = '|',
    bucketed_by = ARRAY['__SOME_COLUMN__'],
    bucket_count = 1
) AS
SELECT *
FROM srvgrp
WHERE category_code = 'ACOMNCDU'

Note:

  1. It is important to set bucket_count = 1, otherwise Athena will create multiple files.

  2. Name of the table in CREATE_TABLE ... also should be unique, e.g. use timestamp prefix/suffix which you can inject during python runtime.

  3. External location should be unique, e.g. use timestamp prefix/suffix which you can inject during python runtime. I would advise to embed table name into S3 path.

  4. You need to include in bucketed_by only one of the columns from SELECT.

  5. At some point you would need to clean up AWS Glue Data Catalog from all table defintions that were created in such way

Upvotes: 2

Related Questions