Reputation: 13524
I would like to unload data files from Amazon Redshift to Amazon S3 in Apache Parquet format inorder to query the files on S3 using Redshift Spectrum. I have explored every where but I couldn't find anything about how to offload the files from Amazon Redshift to S3 using Parquet format. Is this feature not supported yet or was I not able to find any documentation about it. Could somebody who has worked on it share some light on this? Thank you.
Upvotes: 12
Views: 16555
Reputation: 181
Redshift Unload to Parquet file format is supported as of Dec 2019:
UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
FORMAT PARQUET
It is mentioned in Redshift Features
and also updated in Unload Document
with an example provided in the Unload Examples Document
Excerpt of the official documentation:
The following example unloads the LINEITEM table in Parquet format, partitioned by the l_shipdate column.
unload ('select * from lineitem')
to 's3://mybucket/lineitem/'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
PARQUET
PARTITION BY (l_shipdate);
Assuming four slices, the resulting Parquet files are dynamically partitioned into various folders.
s3://mybucket/lineitem/l_shipdate=1992-01-02/0000_part_00.parquet
0001_part_00.parquet
0002_part_00.parquet
0003_part_00.parquet
s3://mybucket/lineitem/l_shipdate=1992-01-03/0000_part_00.parquet
0001_part_00.parquet
0002_part_00.parquet
0003_part_00.parquet
s3://mybucket/lineitem/l_shipdate=1992-01-04/0000_part_00.parquet
0001_part_00.parquet
0002_part_00.parquet
0003_part_00.parquet
Upvotes: 18
Reputation: 28698
You can't do this. Redshift doesn't know about Parquet (although you can read Parquet files through the Spectrum abstraction).
You can UNLOAD
to text files. They can be encrypted or zipped, but they are only ever flat text files.
Looks like this is now supported:
Upvotes: 3
Reputation: 946
A great solution 'Spectrify' does this but if you don't want to do it using the AWS Services; you could use Spark on EMR + Databricks to read data from Redshift and write it into S3 in parquet format.
The following link will give you an idea to do the same
https://github.com/aws-samples/aws-big-data-blog/tree/master/aws-blog-spark-parquet-conversion
Upvotes: 0
Reputation: 1080
Unfortunately, so far, AWS Redshift did not extend its ability to read the parquet format.
Though you can do one of the following :
Till today, there is no support for Apache Parquet in AWS out of the shelf.
I hope this helps.
Upvotes: 1