Luke101
Luke101

Reputation: 65238

Trying to understand over() and partition by

I am trying to get the over and partition by functionality wrapped around my head. Here is an example that I just do not understand.

Here is the data I have:

SALESORDERID       ORDERDATE 
43894              08/01/2001 
43664              07/01/2001 
43911              08/01/2001 
43867              08/01/2001 
43877              08/01/2001 
44285              10/01/2001 
44501              11/01/2001 
43866              08/01/2001 
43895              08/01/2001 
43860              08/01/2001

When I run this query:

select Row_Number() over(partition by orderdate order by orderdate asc) 
    as Rownumber, salesorderid, orderdate
from test2
order by rownumber

Here are the results I get:

ROWNUMBER     SALESORDERID       ORDERDATE 
1             43664              07/01/2001 
1             43911              08/01/2001 
1             44109              09/01/2001 
1             44483              11/01/2001 
1             44285              10/01/2001 
2             43867              08/01/2001 
2             44501              11/01/2001 
3             43895              08/01/2001 
4             43894              08/01/2001 
5             43877              08/01/2001 

Can someone explain this query to me. I am not new to SQL but windowing I have been struggling with and can't get my head wrapped around this.

Upvotes: 23

Views: 31058

Answers (2)

Chi
Chi

Reputation: 23164

Try ordering by order date, you'll see the results more easily

select Row_Number() over(partition by orderdate order by orderdate asc) 
    as Rownumber, salesorderid, orderdate
from test2
order by orderdate;

should give (i've added blank lines for clarity)

ROWNUMBER     SALESORDERID       ORDERDATE
1             43664              07/01/2001

1             43911              08/01/2001
2             43867              08/01/2001
3             43895              08/01/2001
4             43894              08/01/2001
5             43877              08/01/2001

1             44109              09/01/2001

1             44285              10/01/2001

1             44483              11/01/2001
2             44501              11/01/2001

You'll notice that the result is divided into 'partitions', each partition being the set of rows with identical orderdates. That is what 'partition by orderdate' means.

Within a partition, the rows are ordered by orderdate, as per the second clause of '(partition by orderdate order by orderdate asc)'. That isn't very useful, as all rows within a partition are going to have the same orderdate. Because of that, the ordering of the rows within a partition is random. Try ordering by salesorderid within the partition by clause to have a more reproducable result.

row_number() just returns the row's ordering within each partition

Upvotes: 35

ruakh
ruakh

Reputation: 183270

The partition by orderdate means that you're only comparing records to other records with the same orderdate. For example, of the five records with orderdate = '08/01/2001', one will have row_number() = 1, one will have row_number() = 2, and so on.

The order by orderdate asc means that, within a partition, row-numbers are to be assigned in order of orderdate. In your example that has no effect, because you're already partitioning by orderdate, so all records within a partition will have the same orderdate. (It would be like writing SELECT ... FROM t WHERE c = 6 ORDER BY c: all selected records have the same value of c, so the ORDER BY c does nothing.) So, within a partition, the assignment of row_number() is arbitrary: each row will have a different number, but there are no guarantees about which row will have which number.

Upvotes: 14

Related Questions