How to get the highest value row after grouping two columns and getting value counts in Pandas Dataframe?

Question

I'm grouping by two columns with the following line of code:

df.groupby('topic')['category'].value_counts()

I get the following output:

topic                 category     

topic1            Entertainment    1303
                  Science           462
                  Sports            351
                  Economy           270
                  Business          161
                  Technology         92
                  Education          40
                  Politics           18
                  Environment         5

topic2            Politics          134
                  Economy           133
                  Entertainment     110
                  Sports             69
                  Business           68
                  Science            45
                  Technology         22
                  Education           7
                  Environment         2

topic3            Entertainment    1370
                  Sports            533
                  Economy           485
                  Science           335
                  Business          207
                  Politics          180
                  Education         108
                  Technology         97
                  Environment        12

I want to get the topmost row for every topic (which is the most frequent category), something like this:

topic                 category     

topic1            Entertainment    1303
topic2            Politics          134
topic3            Entertainment    1370

Shaido · Accepted Answer

In pandas, value_counts will sort the values in descending order so everything you need to do is take the top value from each group and return that. This can easily be done by applying a function:

def top_value_count(x):
    return x.value_counts().head(1)

df.groupby('topic')['category'].apply(top_value_count)

Change the 1 to another number to return more values per topic.

How to get the highest value row after grouping two columns and getting value counts in Pandas Dataframe?

Answers (1)

Related Questions