Reputation: 441
I have a data frame as below
import pandas as pd
df = pd.DataFrame({'UserId': [1,2,2,3,3,3,4,4,4,4], 'Value': [1,2,3,4,5,6,7,8,9,0]})
print(df)
Now I want to sort / display UserId column based on its highest repeated value. In the above data frame the order is 4,3,2,1. Now my expected output is as below
df = pd.DataFrame({'UserId': [4,4,4,4,3,3,3,2,2,1], 'Value': [7,8,9,0,4,5,6,2,3,1]})
print(df)
Here I did manually. I need code for large data frame values. Guide me for my situation. Thanks in advance.
Upvotes: 1
Views: 22
Reputation: 1562
You can first get the count for each UserId
:
>>> counts = df.UserId.value_counts()
>>> counts
4 4
3 3
2 2
1 1
Name: UserId, dtype: int64
Then, you can create a new column that indicates the UserId
count for each user (could also be done with a merge):
>>> df['UserIdCount'] = df['UserId'].apply(lambda x: counts.loc[x])
>>> df
UserId Value UserIdCount
0 1 1 1
1 2 2 2
2 2 3 2
3 3 4 3
4 3 5 3
5 3 6 3
6 4 7 4
7 4 8 4
8 4 9 4
9 4 0 4
Then, you just sort by this column :)
>>> df = df.sort_values('UserIdCount', ascending=False)
>>> df
UserId Value UserIdCount
6 4 7 4
7 4 8 4
8 4 9 4
9 4 0 4
3 3 4 3
4 3 5 3
5 3 6 3
1 2 2 2
2 2 3 2
0 1 1 1
Cheers!
Upvotes: 2