Aki
Aki

Reputation: 419

SQL 'partition by order by' turns count() into rank()?

I am trying to figure out how to use partition by properly, and looking for a brief explanation to the following results. (I apologize for including the test data without proper SQL code.)

Example 1: Counts the IDs (e.g. shareholders) for each company and adds it to the original data frame (as "newvar").

select ID, company, 
count(ID) over(partition by company) as newvar
from testdata;

Example 2: When I now add order by shares count() somehow seems to turn into rank(), so that the output is merely a ranking variable.

select ID, company, 
count(ID) over(partition by company order by shares) as newvar
from testdata;

I thought order by just orders the data, but it seems to have an impact on "newvar".

Is there a simple explanation to this?

Many thanks in advance!

.csv file that contains testdata:

ID;company;shares
1;a;10
2;a;20
3;a;70
1;b;50
4;b;10
5;b;10
6;b;30
2;c;80
3;c;10
7;c;10
1;d;20
2;d;30
3;d;25
6;d;10
7;d;15

Upvotes: 0

Views: 40

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270733

count() with an order by does a cumulative count. It is going to turn the value either into rank() or row_number(), depending on ties in the shares value and how the database handles missing windows frames (rows between or range between).

If you want to just order the data, then the order by should be after the from clause:

select ID, company, 
       count(ID) over(partition by company) as newvar
from testdata
order by shares;

Upvotes: 1

Related Questions