Iain Dillingham
Iain Dillingham

Reputation: 625

How to sort a pandas data frame by value counts of a column?

I'd like to sort the following pandas data frame by the result of df['user_id'].value_counts().

import pandas as pd
n = 100
df = pd.DataFrame(index=pd.Index(range(1, n+1), name='gridimage_id'))
df['user_id'] = 2
df['has_term'] = True
df.iloc[:10, 0] = 1

The sort should be stable, meaning that whilst user 2's rows would come before user 1's rows, the user 2's rows and user 1's rows would be in the original order.

I was thinking about using df.groupby, merging df['user_id'].value_counts() with the data frame, and also converting df['user_id'] to ordered categorical data. However, none of these approaches seemed particularly elegant.

Thanks in advance for any help!

Upvotes: 4

Views: 6604

Answers (1)

piRSquared
piRSquared

Reputation: 294258

transform and argsort

Use kind='mergesort' for stability

df.iloc[df.groupby('user_id').user_id.transform('size').argsort(kind='mergesort')]

factorize, bincount, and argsort

Use kind='mergesort' for stability

i, r = pd.factorize(df['user_id'])
a = np.argsort(np.bincount(i)[i], kind='mergesort')
df.iloc[a]

Response to Comments

Thank you @piRSquared. Is it possible to reverse the sort order, though? value_counts is in descending order. In the example, user 2 has 90 rows and user 1 has 10 rows. I'd like user 2's rows to come first. Unfortunately, Series.argsort ignores the order kwarg. – Iain Dillingham 4 mins ago

Quick and Dirty

Make the counts negative

df.iloc[df.groupby('user_id').user_id.transform('size').mul(-1).argsort(kind='mergesort')]

Or

i, r = pd.factorize(df['user_id'])
a = np.argsort(-np.bincount(i)[i], kind='mergesort')
df.iloc[a]

Upvotes: 12

Related Questions