Reputation: 1244
Originally, the question was in the DB Administrators site here, but since it's not that noticed over there, I thought I'd bring it up in here. I'm going to make a summary, since the whole question is about being confused how to set up parallel queries in Amazon Redshift:
We are building a business intelligence system, our back is in JSF and our DB machine is Amazon Redshift.
We're sending the queries in parallel. At first it didn't seems to perform any better, like they were sent in parallel, but being resolved in sequence in the DB machine.
We found in the documentation:
http://docs.aws.amazon.com/redshift/latest/dg/cm-c-executing-queries.html
http://docs.aws.amazon.com/redshift/latest/dg/c_troubleshooting_query_performance.html
http://docs.aws.amazon.com/redshift/latest/dg/r_wlm_query_slot_count.html
That redshift by default receive 5 queries at same time, but that is a setting we can change.
There are 3 main things to take into account: query slots, concurrency and queues. We have understood this:
A queue is like a thread in Java. A query arrives and is designated to the "less loaded" queue, and it waits for its turn to be resolved. We can have as many queues as we like. A queue has some memory allocated (we guess divided equally?) In a queue we can assign user groups or queries groups. But in short-term, that's a lot of classification work in our queries we can't do right now.
A concurrency is the amount of queries that a queue can run in parallel. By default is 5.
And a query slot is the amount of memory a query can use. It's related to concurrency as we understood it. The more concurrency a queue has, the less memory in each query slot it has.
We have tried to have 3 queues, each one with concurrency 5; performance increased a lot, like 40%, but I think there are better ways to set this up.
So, had we understood correctly? We have some views doing up to 25-28 queries, and the total amount of loading time is around 60s, how can we leave the settings the queries can be resolved faster?
Upvotes: 1
Views: 961
Reputation: 465
If all your queries look-alike, you can use only one queue with 100% memory and increase its concurrency level to 30. (however it's usually not recommended to set the concurrency level to something > 15-20)
From my understanding, you usually define multiple queues when some of your queries are "heavy duty". This lets you redirect the heavy duty queries in one queue and the "normal" queries to another. This way you can make sure that your "normal" queries don't get stuck behind some heavy duty query that will last for hours.
Upvotes: 0