Reputation: 33
I have a DataFrame with two columns From
and To
, and I need to know the most frequent combination of locations From
and To
.
Example:
From To
------------------
Home Office
Home Office
Home Office
Airport Home
Restaurant Office
Upvotes: 3
Views: 1694
Reputation: 30920
IIUC, SeriesGroupBy.value_counts
and Series.idxmax
df.groupby('From')['To'].value_counts().idxmax()
Output
('Home', 'Office')
in general groupby.value_counts
is faster than groupby.size
Another way:
df.apply(tuple, axis=1).value_counts().idxmax()
Or
df.apply(tuple, axis=1).mode()
Output
0 (Home, Office)
dtype: object
Upvotes: 1
Reputation: 685
if the order does matter:
df['FROM_TO'] = df['FROM'] + df['TO']
df['COUNT'] = 1
df.groupby(['FROM_TO'])['COUNT'].sum()
gives you all the occurrences in one go. Simply take the max to find the largest occurrence.
If the order does matter first sort the values before:
df.loc[:,:] = np.sort(df.values,axis=1) # if the df only consists of the FROM adn TO columns.
Upvotes: 2
Reputation: 1227
You can group by the two columns together and count the number of occurrences of each pair, then sort the pairs by this count.
The following code does the job:
df.groupby(["From", "To"]).size().sort_values(ascending=False)
and, for the example of the question, it returns:
From To
-----------------------
Home Office 3
Restaurant Office 1
Airport Home 1
Upvotes: 1