Reputation: 13
I'm working on a Spring project that needs exporting Redshift table data into local a single CSV file. The current approach is to:
UNLOAD (
'SELECT DISTINCT #{#TYPE_ID}
FROM target_audience
WHERE #{#TYPE_ID} is not null
AND #{#TYPE_ID} != \'\'
GROUP BY #{#TYPE_ID}'
)
TO '#{#s3basepath}#{#s3jobpath}target_audience#{#unique}_'
credentials 'aws_access_key_id=#{#accesskey};aws_secret_access_key=#{#secretkey}'
DELIMITER AS ',' ESCAPE GZIP ;
The above approach has been fine and all. But i think the overall performance can be improved by, for example skipping the S3 part and get data directly from Redshift to local.
After searching through online resources, i found that you can export data from redshift directly through psql or to perform SELECT queries and move the result data myself. But neither option can top Redshift UNLOAD performance with parallel writing.
So is there any way i can mimic UNLOAD parallel writing to achieve the same performance without having to go through S3 ?
Upvotes: 1
Views: 2539
Reputation: 270124
You can avoid the need to join files together by using UNLOAD
with the PARALLEL OFF
parameter. It will output only one file.
This will, however, create multiple files if the filesize exceeds 6.2GB.
It is doubtful that you would get better performance by running psql
, but if performance is important for you then you can certainly test the various methods.
Upvotes: 0
Reputation: 5739
We do exactly same as you'r trying to do here. In our performance comparison, it found to be almost same or even better in some cases in our user case. Hence programming and debugging wise its easy. As there is practically one step.
//replace user/password,host,region,dbname appropriately in given command
psql postgresql://user:[email protected]:5439/dbname?sslmode=require -c "select C1,C2 from sch1.tab1" > ABC.csv
This enables us to avoid 3 steps,
On other hand also saving some cost(S3 storing, though its negligible).
By the way, pgsql(9.0+) onwards, sslcompression
is bydefault on
.
Upvotes: 0