Nir Ben Yaacov
Nir Ben Yaacov

Reputation: 1182

Should I use unload command often on redshift?

we are currently using an ETL tool (over Hadoop) for our non-technical users for them to create a csv file with raw data. Dev creates a process for them according with there needs, and they run it on demand. Since we are using the data lake files (S3) to create the output we need to join all the facts together and run some heavy duty jobs that Hadoop takes a while to do.

we would like these processes to run in a shorter time. my thought is to implement redshift to this task using UNLOAD command. since data in redshift is already built in accordance with business requirements, it is usually a very simple query to get them what they want, that runs for 2-5 minutes.

however, I am not sure if giving our user a option of running unload command (not by themselves, through the built process) on demand, may be straining on redshift.

Can anyone provide some info on this. we expect about 20 queries a day of 2-4 minutes.

Thanks

Nir

Upvotes: 1

Views: 513

Answers (2)

pcothenet
pcothenet

Reputation: 401

I'm running hundreds of UNLOADs a day (to send Redshift data to external APIs) or to back-up and restore or deep copy tables. I have never run into a problem.

Performance seems to be the same as the equivalent SELECT (a little bit more if you use compression)

Upvotes: 0

mike_pdb
mike_pdb

Reputation: 2828

It's not much more demanding than any other select. I suggest that you define a specific WLM queue for these users. That way you can limit the amount of resources they are using and isolate any impact they are having from the rest of the system.

Upvotes: 1

Related Questions