Harry Smith
Harry Smith

Reputation: 33

Finding the most frequent combination in DataFrame

I have a DataFrame with two columns From and To, and I need to know the most frequent combination of locations From and To.

Example:

From        To
------------------
Home        Office
Home        Office
Home        Office
Airport     Home
Restaurant  Office

Upvotes: 3

Views: 1694

Answers (3)

ansev
ansev

Reputation: 30920

IIUC, SeriesGroupBy.value_counts and Series.idxmax

df.groupby('From')['To'].value_counts().idxmax()

Output

('Home', 'Office')

in general groupby.value_counts is faster than groupby.size

Another way:

df.apply(tuple, axis=1).value_counts().idxmax()

Or

df.apply(tuple, axis=1).mode()

Output

0    (Home, Office)
dtype: object

Upvotes: 1

EddyG
EddyG

Reputation: 685

if the order does matter:

df['FROM_TO'] = df['FROM'] + df['TO']

df['COUNT'] = 1

df.groupby(['FROM_TO'])['COUNT'].sum()

gives you all the occurrences in one go. Simply take the max to find the largest occurrence.

If the order does matter first sort the values before:

df.loc[:,:] = np.sort(df.values,axis=1) # if the df only consists of the FROM adn TO columns.

Upvotes: 2

Roberto Trani
Roberto Trani

Reputation: 1227

You can group by the two columns together and count the number of occurrences of each pair, then sort the pairs by this count.

The following code does the job:

df.groupby(["From", "To"]).size().sort_values(ascending=False)

and, for the example of the question, it returns:

From        To
-----------------------
Home        Office    3
Restaurant  Office    1
Airport     Home      1

Upvotes: 1

Related Questions