Tokunbo Hiamang
Tokunbo Hiamang

Reputation: 449

Unloading from redshift to s3 with headers

I already know how to unload a file from redshift into s3 as one file. I need to know how to unload with the column headers. Can anyone please help or give me a clue?

I don't want to manually have to do it in shell or python.

Upvotes: 31

Views: 44422

Answers (9)

sambeth
sambeth

Reputation: 1590

To unload a table as csv to s3 including the headers, you will simply have to do it this way

UNLOAD ('SELECT * FROM {schema}.{table}')
        TO 's3://{s3_bucket}/{s3_key}/{table}/'
        with credentials
        'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
        CSV HEADER ALLOWOVERWRITE PARALLEL OFF;

Upvotes: 1

karthik G
karthik G

Reputation: 21

Try like this:

Unload VENUE with a Header:

unload ('select * from venue where venueseats > 75000')
to 's3://mybucket/unload/' 
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
header
parallel off;

The following shows the contents of the output file with a header row:

venueid|venuename|venuecity|venuestate|venueseats
6|New York Giants Stadium|East Rutherford|NJ|80242
78|INVESCO Field|Denver|CO|76125
83|FedExField|Landover|MD|91704
79|Arrowhead Stadium|Kansas City|MO|79451

Upvotes: 2

santhosh
santhosh

Reputation: 81

Redshift now supports unload with headers. September 19–October 10, 2018 release.

The syntax for unloading with headers is -

UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
authorization
HEADER

Upvotes: 8

Marco Roy
Marco Roy

Reputation: 5243

Unfortunately, the UNLOAD command doesn't natively support this feature (see other answers for how to do it with workarounds).

I've posted a feature request on the AWS forums, so hopefully it gets added someday.

Edit: The feature has now been implemented natively in Redshift! 🎉

Upvotes: 2

fez
fez

Reputation: 1835

As of cluster version 1.0.3945, Redshift now supports unloading data to S3 with header rows in each file i.e.

UNLOAD('select column1, column2 from mytable;')
TO 's3://bucket/prefix/'
IAM_ROLE '<role arn>'
HEADER;

Note: you can't use the HEADER option in conjunction with FIXEDWIDTH.

https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

Upvotes: 38

Kamran
Kamran

Reputation: 257

Just to complement the answer, to ensure the header row comes first, you don't have to order by a specific column of data. You can enclose the UNIONed selects inside another select, add a ordinal column to them and then in the outer select order by that column without including it in the list of selected columns.

UNLOAD ('

  SELECT column_1, column_2 FROM (

     SELECT 1 AS i,\'column_1\' AS column_, \'column_2\' AS column_2
     UNION ALL
     SELECT 2 AS i, column_1::varchar(255), column_2::varchar(255)
     FROM source_table_for_export_to_s3

  ) t ORDER BY i

')
TO 's3://bucket/path/file_name_for_table_export_in_s3_'

CREDENTIALS
 'aws_access_key_id=...;aws_secret_access_key=...' 

DELIMITER ','
PARALLEL OFF 
ESCAPE
ADDQUOTES;

Upvotes: 12

Thomas Spicer
Thomas Spicer

Reputation: 47

To make the process easier you can use a pre-built docker image to extract and include the header row.

https://github.com/openbridge/ob_redshift_unload

It will also do a few other things, but it seemed to make sense to package this in an easy to use format.

Upvotes: 1

Sandesh Deshmane
Sandesh Deshmane

Reputation: 2305

There is no direct option provided by redshift unload .

But we can tweak queries to generate files with rows having headers added.

First we will try with parallel off option so that it will create only on file.

"By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. The default option is ON or TRUE. If PARALLEL is OFF or FALSE, UNLOAD writes to one or more data files serially, sorted absolutely according to the ORDER BY clause, if one is used. The maximum size for a data file is 6.2 GB. So, for example, if you unload 13.4 GB of data, UNLOAD creates the following three files."

To have headers in unload files we will do as below.

Suppose you have table as below

create table mutable
(
    name varchar(64) default NULL,
    address varchar(512) default NULL
)

Then try to use select command from you unload as below to add headers as well

( select 'name','address') union ( select name,address from mytable )

this will add headers name and address as first line in your output.

Upvotes: 12

Douglas Hackney
Douglas Hackney

Reputation: 319

If any of your columns are non-character, then you need to explicitly cast them as char or varchar because the UNION forces a cast.

Here is an example of the full statement that will create a file in S3 with the headers in the first row.

The output file will be a single CSV file with quotes.

This example assumes numeric values in column_1. You will need to adjust the ORDER BY clause to a numeric column to ensure the header row is in row 1 of the S3 file.

    ******************************************

    /* Redshift export to S3 CSV single file with headers - limit 6.2GB */

    UNLOAD ('

        SELECT \'column_1\',\'column_2\'

      UNION 

        SELECT 


          CAST(column_1 AS varchar(255)) AS column_1,
          CAST(column_2 AS varchar(255)) AS column_2


        FROM source_table_for_export_to_s3 


      ORDER BY 1 DESC

      ;



    ')

    TO 's3://bucket/path/file_name_for_table_export_in_s3_' credentials
     'aws_access_key_id=<key_with_no_<>_brackets>;aws_secret_access_key=<secret_access_key_with_no_<>_brackets>' 


    PARALLEL OFF 


    ESCAPE


    ADDQUOTES


    DELIMITER ','


    ALLOWOVERWRITE


    GZIP


    ;


    ****************************************

Upvotes: 25

Related Questions