Patrik Novotný
Patrik Novotný

Reputation: 79

Python - pandas, group by and max count

I need the most similar (max count) from column cluster-1 from column cluster-2.

Input - data

Input data

Output - data

output

I use the command: df.groupby(['cluster-1','cluster-2'])['cluster-2'].count() this command will give me count per occurrence in the column cluster-2. I need advice on how to proceed, thanks.

Upvotes: 1

Views: 548

Answers (1)

jezrael
jezrael

Reputation: 862681

Use SeriesGroupBy.value_counts because by default sorted values, so possible convert MultiIndex to DataFrame by MultiIndex.to_frame and then remove duplicates by cluster-1 in DataFrame.drop_duplicates:

df1 = (df.groupby(['cluster-1'])['cluster-2']
         .value_counts()
         .index
         .to_frame(index=False)
         .drop_duplicates('cluster-1'))

Upvotes: 2

Related Questions