Reputation:
In the Amazon Redshift Documentation, they advise to have ordering in the ORDERBY and GROUPBY clauses:
If you use both GROUP BY and ORDER BY clauses, make sure you put the columns in the same order in both. That is, use the following approach:
group by a, b, c
order by a, b, c
Don't use this approach:
group by b, c, a
order by a, b, c
Source: http://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.html
They don't explain why! Does anyone have any ideas?
Upvotes: 2
Views: 2602
Reputation: 14035
GROUP BY
implicitly requires the data to be sorted. If your ORDER BY
is different the data will have to be sorted twice.
And, since Redshift distributes it's data among multiple nodes, that may mean the data has to be sent to all nodes twice.
If the data set being returned by GROUP BY
is small (10s or 100s of rows) you won't notice, but if it's large (millions of rows) it will be very slow.
Upvotes: 3