Ahamed Moosa
Ahamed Moosa

Reputation: 1445

String mode aggregation with group by function

I have dataframe which looks like below

Country  City
UK       London
USA      Washington
UK       London
UK       Manchester
USA      Washington
USA      Chicago

I want to group country and aggregate on the most repeated city in a country

My desired output should be like

Country City
UK      London
USA     Washington

Because London and Washington appears 2 times whereas Manchester and Chicago appears only 1 time.

I tried

from scipy.stats import mode
df_summary = df.groupby('Country')['City'].\
                        apply(lambda x: mode(x)[0][0]).reset_index()

But it seems it won't work on strings

Upvotes: 2

Views: 885

Answers (2)

Karn Kumar
Karn Kumar

Reputation: 8816

try like below:

>>> df.City.mode()
0        London
1    Washington
dtype: object

OR

import pandas as pd
from scipy import stats

Can use scipy with stats + lambda :

df.groupby('Country').agg({'City': lambda x:stats.mode(x)[0]})
               City
Country
UK           London
USA      Washington

#  df.groupby('Country').agg({'City': lambda x:stats.mode(x)[0]}).reset_index()

However, it gives nice count as well if you don't want to return ony First value:

>>> df.groupby('Country').agg({'City': lambda x:stats.mode(x)})
                        City
Country
UK           ([London], [2])
USA      ([Washington], [2])

Upvotes: 1

jpp
jpp

Reputation: 164693

I can't replicate your error, but you can use pd.Series.mode, which accepts strings and returns a series, using iat to extract the first value:

res = df.groupby('Country')['City'].apply(lambda x: x.mode().iat[0]).reset_index()

print(res)

  Country        City
0      UK      London
1     USA  Washington

Upvotes: 1

Related Questions