irene
irene

Reputation: 2253

pandas faster way than argsort to rank in dataframe subset

I have this dataframe:

user1    user2   quantity
--------------------------
Alice    Carol     10
Alice    Bob       5
Bob      Dan       2
Carol    Eve       7
Carol    Dan      100

I want to rank the each row in descending order, using the quantity, BUT by user 1. Example:

user1    user2   quantity   order
----------------------------------
Alice    Carol     10       1
Alice    Bob       5        2
Bob      Dan       2        1
Carol    Eve       7        2
Carol    Dan      100       1

Currently, my code goes like this:

users = df['user1'].unique()
for user in users:
    cond = (df['user1'] == user)
    sort_ser = df[cond]['quantity'].values.argsort()[::-1] # descending
    df.loc[cond, 'order'] = sort_ser + 1

It works -- for small dataframes. But it's slow if it's for large ones. I think it's because (1) I'm essentially running it per user, and (2) several sorts are taking place. Is there a faster way to do this?

Upvotes: 4

Views: 958

Answers (2)

piRSquared
piRSquared

Reputation: 294488

With some Numpy

a = np.lexsort([-df.quantity, df.user1])
u, idx, inv = np.unique(df.user1, return_index=True, return_inverse=True)

df.assign(order=a - idx.repeat(np.bincount(inv)) + 1)

   user1  user2  quantity  order
0  Alice  Carol        10      1
1  Alice    Bob         5      2
2    Bob    Dan         2      1
3  Carol    Eve         7      2
4  Carol    Dan       100      1

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153500

Use:

df['order'] = df.groupby('user1')['quantity'].rank(ascending=False).astype(int)

Output:

   user1  user2  quantity  order
0  Alice  Carol        10      1
1  Alice    Bob         5      2
2    Bob    Dan         2      1
3  Carol    Eve         7      2
4  Carol    Dan       100      1

Details.

df.groupby('user1')['quantity'].rank(ascending=False)

Output:

0    1.0
1    2.0
2    1.0
3    2.0
4    1.0
Name: quantity, dtype: float64

Upvotes: 6

Related Questions