Reputation: 111
suppose that I have this dataframe
date id
0 2019-10-01 303
1 2019-10-01 303
2 2019-10-01 404
3 2019-10-01 404
4 2019-10-02 303
5 2019-10-02 303
6 2019-10-02 404
7 2019-10-02 404
I want to apply ranking so that I have this dataframe
date id id_rank
0 2019-10-01 303 1
1 2019-10-01 303 1
2 2019-10-01 404 1
3 2019-10-01 404 1
4 2019-10-02 303 2
5 2019-10-02 303 2
6 2019-10-02 404 2
7 2019-10-02 404 2
I tried pandas.groupby.rank()
with this code
df["id_rank"] = df.groupby(["index_date", "id"])["id"].rank(method="first")
which gets me closer to the wanted result
date id id_rank
0 2019-10-01 303 1
1 2019-10-01 303 1
2 2019-10-01 404 1
3 2019-10-01 404 1
4 2019-10-02 303 1
5 2019-10-02 303 1
6 2019-10-02 404 1
7 2019-10-02 404 1
I want to rank based on date. If the id is registered in the dataframe for the first day, it takes rank 1 and if it's registered again on the second day, it takes rank 2 and so on; rank 3 if the id is registered on a third day
Upvotes: 0
Views: 32
Reputation: 46
The ranking logic is not entirely clear. There are two ways it can be interpreted: (a) everything with the same date has the same rank or (b) the dates are ranked per id. Given the name id_rank
I assume that it is the second option.
If the ranking logic is (a), you can simply write
df["id_rank"] = df.date.rank(method="dense").astype("int")
The method="dense"
argument results in the ranking per unique value, rather than an index to the sorted results as the normal rank()
function returns.
If the ranking logic is (b), you can use a similar idea as in the first method, but you first group by the id
before ranking:
df["id_rank"] = df.groupby("id").date.rank(method="dense").astype("int")
To illustrate the difference, suppose you instead have the dataframe
date id
0 2019-10-01 303
1 2019-10-01 404
2 2019-10-01 404
3 2019-10-02 303
4 2019-10-02 404
5 2019-10-02 405
6 2019-10-03 404
7 2019-10-03 405
Then the result given by the two different options would be:
date id option_a option_b
0 2019-10-01 303 1 1
1 2019-10-01 404 1 1
2 2019-10-01 404 1 1
3 2019-10-02 303 2 2
4 2019-10-02 404 2 2
5 2019-10-02 405 2 1
6 2019-10-03 404 3 3
7 2019-10-03 405 3 2
The key difference being whether the ranks for id 405
starts at 1 or 2.
Upvotes: 2