Reputation: 47
I want to have an SQL rank equivalent in python pandas module. I want to filter out all the dates which have the rank = 1
I've the following dataframe:
id date
12 2021-06-01
12 2021-06-15
12 2021-06-21
34 2021-06-05
87 2021-06-19
53 2021-06-05
and I need to apply rank in the following way:
id date rank
12 2021-06-01 1
12 2021-06-15 2
12 2021-06-21 3
34 2021-06-05 1
87 2021-06-19 1
53 2021-06-05 1
The SQL for this would be
select id, date, rank() over (partition by id order by date asc) as rank
from table;
Upvotes: 3
Views: 699
Reputation: 862831
IIUC use GroupBy.rank
:
df['date'] = pd.to_datetime(df['date'])
df['rank'] = df.groupby('id')['date'].rank(method='dense').astype(int)
print (df)
id date rank
0 12 2021-06-01 1
1 12 2021-06-15 2
2 12 2021-06-21 3
3 34 2021-06-05 1
4 87 2021-06-19 1
5 53 2021-06-05 1
If datetimes are sorted per groups is possible GroupBy.cumcount
:
df = df.sort_values(['id','date'])
df['rank'] = df.groupby('id')['date'].cumcount().add(1)
Upvotes: 4