Pandas_data frame/Python : How to sort a data frame column based on its highest repeated value count?

Question

I have a data frame as below

import pandas as pd

df = pd.DataFrame({'UserId': [1,2,2,3,3,3,4,4,4,4], 'Value': [1,2,3,4,5,6,7,8,9,0]})

print(df)

Now I want to sort / display UserId column based on its highest repeated value. In the above data frame the order is 4,3,2,1. Now my expected output is as below

df = pd.DataFrame({'UserId': [4,4,4,4,3,3,3,2,2,1], 'Value': [7,8,9,0,4,5,6,2,3,1]})

print(df)

Here I did manually. I need code for large data frame values. Guide me for my situation. Thanks in advance.

araraonline · Accepted Answer

You can first get the count for each UserId:

>>> counts = df.UserId.value_counts()
>>> counts
4    4
3    3
2    2
1    1
Name: UserId, dtype: int64

Then, you can create a new column that indicates the UserId count for each user (could also be done with a merge):

>>> df['UserIdCount'] = df['UserId'].apply(lambda x: counts.loc[x])
>>> df
   UserId  Value  UserIdCount
0       1      1            1
1       2      2            2
2       2      3            2
3       3      4            3
4       3      5            3
5       3      6            3
6       4      7            4
7       4      8            4
8       4      9            4
9       4      0            4

Then, you just sort by this column :)

>>> df = df.sort_values('UserIdCount', ascending=False)
>>> df
   UserId  Value  UserIdCount
6       4      7            4
7       4      8            4
8       4      9            4
9       4      0            4
3       3      4            3
4       3      5            3
5       3      6            3
1       2      2            2
2       2      3            2
0       1      1            1

Cheers!

Pandas_data frame/Python : How to sort a data frame column based on its highest repeated value count?

Answers (1)

Related Questions