Reputation: 1182
we are currently using an ETL tool (over Hadoop) for our non-technical users for them to create a csv file with raw data. Dev creates a process for them according with there needs, and they run it on demand. Since we are using the data lake files (S3) to create the output we need to join all the facts together and run some heavy duty jobs that Hadoop takes a while to do.
we would like these processes to run in a shorter time. my thought is to implement redshift to this task using UNLOAD command. since data in redshift is already built in accordance with business requirements, it is usually a very simple query to get them what they want, that runs for 2-5 minutes.
however, I am not sure if giving our user a option of running unload command (not by themselves, through the built process) on demand, may be straining on redshift.
Can anyone provide some info on this. we expect about 20 queries a day of 2-4 minutes.
Thanks
Nir
Upvotes: 1
Views: 513
Reputation: 401
I'm running hundreds of UNLOADs a day (to send Redshift data to external APIs) or to back-up and restore or deep copy tables. I have never run into a problem.
Performance seems to be the same as the equivalent SELECT (a little bit more if you use compression)
Upvotes: 0
Reputation: 2828
It's not much more demanding than any other select. I suggest that you define a specific WLM queue for these users. That way you can limit the amount of resources they are using and isolate any impact they are having from the rest of the system.
Upvotes: 1