Reputation: 449
I already know how to unload a file from redshift into s3 as one file. I need to know how to unload with the column headers. Can anyone please help or give me a clue?
I don't want to manually have to do it in shell or python.
Upvotes: 31
Views: 44422
Reputation: 1590
To unload a table as csv to s3 including the headers, you will simply have to do it this way
UNLOAD ('SELECT * FROM {schema}.{table}')
TO 's3://{s3_bucket}/{s3_key}/{table}/'
with credentials
'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
CSV HEADER ALLOWOVERWRITE PARALLEL OFF;
Upvotes: 1
Reputation: 21
Try like this:
Unload VENUE with a Header:
unload ('select * from venue where venueseats > 75000')
to 's3://mybucket/unload/'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
header
parallel off;
The following shows the contents of the output file with a header row:
venueid|venuename|venuecity|venuestate|venueseats
6|New York Giants Stadium|East Rutherford|NJ|80242
78|INVESCO Field|Denver|CO|76125
83|FedExField|Landover|MD|91704
79|Arrowhead Stadium|Kansas City|MO|79451
Upvotes: 2
Reputation: 81
Redshift now supports unload with headers. September 19–October 10, 2018 release.
The syntax for unloading with headers is -
UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
authorization
HEADER
Upvotes: 8
Reputation: 5243
Unfortunately, the UNLOAD
command doesn't natively support this feature (see other answers for how to do it with workarounds).
I've posted a feature request on the AWS forums, so hopefully it gets added someday.
Edit: The feature has now been implemented natively in Redshift! 🎉
Upvotes: 2
Reputation: 1835
As of cluster version 1.0.3945, Redshift now supports unloading data to S3 with header rows in each file i.e.
UNLOAD('select column1, column2 from mytable;')
TO 's3://bucket/prefix/'
IAM_ROLE '<role arn>'
HEADER;
Note: you can't use the HEADER
option in conjunction with FIXEDWIDTH
.
https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html
Upvotes: 38
Reputation: 257
Just to complement the answer, to ensure the header row comes first, you don't have to order by a specific column of data. You can enclose the UNIONed selects inside another select, add a ordinal column to them and then in the outer select order by that column without including it in the list of selected columns.
UNLOAD ('
SELECT column_1, column_2 FROM (
SELECT 1 AS i,\'column_1\' AS column_, \'column_2\' AS column_2
UNION ALL
SELECT 2 AS i, column_1::varchar(255), column_2::varchar(255)
FROM source_table_for_export_to_s3
) t ORDER BY i
')
TO 's3://bucket/path/file_name_for_table_export_in_s3_'
CREDENTIALS
'aws_access_key_id=...;aws_secret_access_key=...'
DELIMITER ','
PARALLEL OFF
ESCAPE
ADDQUOTES;
Upvotes: 12
Reputation: 47
To make the process easier you can use a pre-built docker image to extract and include the header row.
https://github.com/openbridge/ob_redshift_unload
It will also do a few other things, but it seemed to make sense to package this in an easy to use format.
Upvotes: 1
Reputation: 2305
There is no direct option provided by redshift unload .
But we can tweak queries to generate files with rows having headers added.
First we will try with parallel off option so that it will create only on file.
"By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. The default option is ON or TRUE. If PARALLEL is OFF or FALSE, UNLOAD writes to one or more data files serially, sorted absolutely according to the ORDER BY clause, if one is used. The maximum size for a data file is 6.2 GB. So, for example, if you unload 13.4 GB of data, UNLOAD creates the following three files."
To have headers in unload files we will do as below.
Suppose you have table as below
create table mutable
(
name varchar(64) default NULL,
address varchar(512) default NULL
)
Then try to use select command from you unload as below to add headers as well
( select 'name','address') union ( select name,address from mytable )
this will add headers name and address as first line in your output.
Upvotes: 12
Reputation: 319
If any of your columns are non-character, then you need to explicitly cast them as char or varchar because the UNION forces a cast.
Here is an example of the full statement that will create a file in S3 with the headers in the first row.
The output file will be a single CSV file with quotes.
This example assumes numeric values in column_1. You will need to adjust the ORDER BY clause to a numeric column to ensure the header row is in row 1 of the S3 file.
******************************************
/* Redshift export to S3 CSV single file with headers - limit 6.2GB */
UNLOAD ('
SELECT \'column_1\',\'column_2\'
UNION
SELECT
CAST(column_1 AS varchar(255)) AS column_1,
CAST(column_2 AS varchar(255)) AS column_2
FROM source_table_for_export_to_s3
ORDER BY 1 DESC
;
')
TO 's3://bucket/path/file_name_for_table_export_in_s3_' credentials
'aws_access_key_id=<key_with_no_<>_brackets>;aws_secret_access_key=<secret_access_key_with_no_<>_brackets>'
PARALLEL OFF
ESCAPE
ADDQUOTES
DELIMITER ','
ALLOWOVERWRITE
GZIP
;
****************************************
Upvotes: 25