Reputation: 391
I'm grouping by two columns with the following line of code:
df.groupby('topic')['category'].value_counts()
I get the following output:
topic category
topic1 Entertainment 1303
Science 462
Sports 351
Economy 270
Business 161
Technology 92
Education 40
Politics 18
Environment 5
topic2 Politics 134
Economy 133
Entertainment 110
Sports 69
Business 68
Science 45
Technology 22
Education 7
Environment 2
topic3 Entertainment 1370
Sports 533
Economy 485
Science 335
Business 207
Politics 180
Education 108
Technology 97
Environment 12
I want to get the topmost row for every topic (which is the most frequent category), something like this:
topic category
topic1 Entertainment 1303
topic2 Politics 134
topic3 Entertainment 1370
Upvotes: 3
Views: 287
Reputation: 28352
In pandas, value_counts
will sort the values in descending order so everything you need to do is take the top value from each group and return that. This can easily be done by applying a function:
def top_value_count(x):
return x.value_counts().head(1)
df.groupby('topic')['category'].apply(top_value_count)
Change the 1
to another number to return more values per topic.
Upvotes: 3