Reputation: 35
Suppose, I have a dataframe as below:
year month message
0 2018 2 txt1
1 2017 4 txt2
2 2019 5 txt3
3 2017 5 txt5
4 2017 5 txt4
5 2020 4 txt3
6 2020 6 txt3
7 2020 6 txt3
8 2020 6 txt4
I want to figure out top three number of messages in each year. So, I grouped the data as below:
df.groupby(['year','month']).count()
which results:
message
year month
2017 4 1
5 2
2018 2 1
2019 5 1
2020 4 1
6 3
The data is in ascending order for both indexes. But how to find the results as shown below where the data is sorted by year (ascending) and count (descending) for top n values. 'month' index will be free.
message
year month
2017 5 2
4 1
2018 2 1
2019 5 1
2020 6 3
4 1
Upvotes: 2
Views: 275
Reputation: 150735
value_counts
gives you sort by default:
df.groupby('year')['month'].value_counts()
Output:
year month
2017 5 2
4 1
2018 2 1
2019 5 1
2020 6 3
4 1
Name: month, dtype: int64
If you want only 2 top values for each year, do another groupby:
(df.groupby('year')['month'].value_counts()
.groupby('year').head(2)
)
Output:
year month
2017 5 2
4 1
2018 2 1
2019 5 1
2020 6 3
4 1
Name: month, dtype: int64
Upvotes: 2
Reputation: 3031
This will sort by year (ascending) and count (descending).
df = df.groupby(['year', 'month']).count().sort_values(['year', 'message'], ascending=[True, False])
Upvotes: 2
Reputation: 96
here you go
df.groupby(['year', 'month']).count().sort_values(axis=0, ascending=False, by='message').sort_values(axis=0, ascending=True, by='year')
Upvotes: 1
Reputation: 96
you can use this code for it.
df.groupby(['year', 'month']).count().sort_index(axis=0, ascending=False).sort_values(by="year", ascending=True)
Upvotes: 0
Reputation: 88236
You can use sort_index
, specifying ascending=[True,False]
so that only the second level is sorted in descending order:
df = df.groupby(['year','month']).count().sort_index(ascending=[True,False])
message
year month
2017 5 2
4 1
2018 2 1
2019 5 1
2020 6 3
4 1
Upvotes: 1