Is there any performance-wise better option for exporting data to local than Redshift unload via s3?

Question

I'm working on a Spring project that needs exporting Redshift table data into local a single CSV file. The current approach is to:

Execute Redshift UNLOAD to write data across multiple files to S3 via JDBC
Download said files from S3 to local
Joining them together into one single CSV file

UNLOAD (
  'SELECT DISTINCT #{#TYPE_ID} 
  FROM target_audience 
  WHERE #{#TYPE_ID} is not null 
  AND #{#TYPE_ID} != \'\' 
  GROUP BY #{#TYPE_ID}'
) 
TO '#{#s3basepath}#{#s3jobpath}target_audience#{#unique}_' 
credentials 'aws_access_key_id=#{#accesskey};aws_secret_access_key=#{#secretkey}' 
DELIMITER AS ',' ESCAPE GZIP ;

The above approach has been fine and all. But i think the overall performance can be improved by, for example skipping the S3 part and get data directly from Redshift to local.

After searching through online resources, i found that you can export data from redshift directly through psql or to perform SELECT queries and move the result data myself. But neither option can top Redshift UNLOAD performance with parallel writing.

So is there any way i can mimic UNLOAD parallel writing to achieve the same performance without having to go through S3 ?

John Rotenstein · Accepted Answer

You can avoid the need to join files together by using UNLOAD with the PARALLEL OFF parameter. It will output only one file.

This will, however, create multiple files if the filesize exceeds 6.2GB.

See: UNLOAD - Amazon Redshift

It is doubtful that you would get better performance by running psql, but if performance is important for you then you can certainly test the various methods.

Is there any performance-wise better option for exporting data to local than Redshift unload via s3?

Answers (2)

Related Questions