Reputation: 625
I'd like to sort the following pandas data frame by the result of df['user_id'].value_counts()
.
import pandas as pd
n = 100
df = pd.DataFrame(index=pd.Index(range(1, n+1), name='gridimage_id'))
df['user_id'] = 2
df['has_term'] = True
df.iloc[:10, 0] = 1
The sort should be stable, meaning that whilst user 2's rows would come before user 1's rows, the user 2's rows and user 1's rows would be in the original order.
I was thinking about using df.groupby
, merging df['user_id'].value_counts()
with the data frame, and also converting df['user_id']
to ordered categorical data. However, none of these approaches seemed particularly elegant.
Thanks in advance for any help!
Upvotes: 4
Views: 6604
Reputation: 294258
transform
and argsort
Use kind='mergesort'
for stability
df.iloc[df.groupby('user_id').user_id.transform('size').argsort(kind='mergesort')]
factorize
, bincount
, and argsort
Use kind='mergesort'
for stability
i, r = pd.factorize(df['user_id'])
a = np.argsort(np.bincount(i)[i], kind='mergesort')
df.iloc[a]
Thank you @piRSquared. Is it possible to reverse the sort order, though? value_counts is in descending order. In the example, user 2 has 90 rows and user 1 has 10 rows. I'd like user 2's rows to come first. Unfortunately, Series.argsort ignores the order kwarg. – Iain Dillingham 4 mins ago
Make the counts negative
df.iloc[df.groupby('user_id').user_id.transform('size').mul(-1).argsort(kind='mergesort')]
Or
i, r = pd.factorize(df['user_id'])
a = np.argsort(-np.bincount(i)[i], kind='mergesort')
df.iloc[a]
Upvotes: 12