Reputation: 79
I need the most similar (max count) from column cluster-1 from column cluster-2.
Input - data
Output - data
I use the command: df.groupby(['cluster-1','cluster-2'])['cluster-2'].count() this command will give me count per occurrence in the column cluster-2. I need advice on how to proceed, thanks.
Upvotes: 1
Views: 548
Reputation: 862681
Use SeriesGroupBy.value_counts
because by default sorted values, so possible convert MultiIndex
to DataFrame
by MultiIndex.to_frame
and then remove duplicates by cluster-1
in DataFrame.drop_duplicates
:
df1 = (df.groupby(['cluster-1'])['cluster-2']
.value_counts()
.index
.to_frame(index=False)
.drop_duplicates('cluster-1'))
Upvotes: 2