user1008537
user1008537

Reputation:

Why does the Order of GROUPBY and ORDERBY matter in Redshift?

In the Amazon Redshift Documentation, they advise to have ordering in the ORDERBY and GROUPBY clauses:

If you use both GROUP BY and ORDER BY clauses, make sure you put the columns in the same order in both. That is, use the following approach:

group by a, b, c

order by a, b, c

Don't use this approach:

group by b, c, a

order by a, b, c

Source: http://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.html

They don't explain why! Does anyone have any ideas?

Upvotes: 2

Views: 2602

Answers (1)

Joe Harris
Joe Harris

Reputation: 14035

GROUP BY implicitly requires the data to be sorted. If your ORDER BY is different the data will have to be sorted twice.

And, since Redshift distributes it's data among multiple nodes, that may mean the data has to be sent to all nodes twice.

If the data set being returned by GROUP BY is small (10s or 100s of rows) you won't notice, but if it's large (millions of rows) it will be very slow.

Upvotes: 3

Related Questions