Reputation: 419
I am trying to figure out how to use partition by
properly, and looking for a brief explanation to the following results. (I apologize for including the test data without proper SQL code.)
Example 1: Counts the IDs (e.g. shareholders) for each company and adds it to the original data frame (as "newvar").
select ID, company,
count(ID) over(partition by company) as newvar
from testdata;
Example 2: When I now add order by shares
count() somehow seems to turn into rank(), so that the output is merely a ranking variable.
select ID, company,
count(ID) over(partition by company order by shares) as newvar
from testdata;
I thought order by just orders the data, but it seems to have an impact on "newvar".
Is there a simple explanation to this?
Many thanks in advance!
.csv file that contains testdata:
ID;company;shares
1;a;10
2;a;20
3;a;70
1;b;50
4;b;10
5;b;10
6;b;30
2;c;80
3;c;10
7;c;10
1;d;20
2;d;30
3;d;25
6;d;10
7;d;15
Upvotes: 0
Views: 40
Reputation: 1270733
count()
with an order by
does a cumulative count. It is going to turn the value either into rank()
or row_number()
, depending on ties in the shares
value and how the database handles missing windows frames (rows between
or range between
).
If you want to just order the data, then the order by
should be after the from
clause:
select ID, company,
count(ID) over(partition by company) as newvar
from testdata
order by shares;
Upvotes: 1