Reputation: 2253
I have this dataframe:
user1 user2 quantity
--------------------------
Alice Carol 10
Alice Bob 5
Bob Dan 2
Carol Eve 7
Carol Dan 100
I want to rank the each row in descending order, using the quantity, BUT by user 1. Example:
user1 user2 quantity order
----------------------------------
Alice Carol 10 1
Alice Bob 5 2
Bob Dan 2 1
Carol Eve 7 2
Carol Dan 100 1
Currently, my code goes like this:
users = df['user1'].unique()
for user in users:
cond = (df['user1'] == user)
sort_ser = df[cond]['quantity'].values.argsort()[::-1] # descending
df.loc[cond, 'order'] = sort_ser + 1
It works -- for small dataframes. But it's slow if it's for large ones. I think it's because (1) I'm essentially running it per user, and (2) several sorts are taking place. Is there a faster way to do this?
Upvotes: 4
Views: 958
Reputation: 294488
With some Numpy
a = np.lexsort([-df.quantity, df.user1])
u, idx, inv = np.unique(df.user1, return_index=True, return_inverse=True)
df.assign(order=a - idx.repeat(np.bincount(inv)) + 1)
user1 user2 quantity order
0 Alice Carol 10 1
1 Alice Bob 5 2
2 Bob Dan 2 1
3 Carol Eve 7 2
4 Carol Dan 100 1
Upvotes: 0
Reputation: 153500
Use:
df['order'] = df.groupby('user1')['quantity'].rank(ascending=False).astype(int)
Output:
user1 user2 quantity order
0 Alice Carol 10 1
1 Alice Bob 5 2
2 Bob Dan 2 1
3 Carol Eve 7 2
4 Carol Dan 100 1
Details.
df.groupby('user1')['quantity'].rank(ascending=False)
Output:
0 1.0
1 2.0
2 1.0
3 2.0
4 1.0
Name: quantity, dtype: float64
Upvotes: 6