proximacentauri
proximacentauri

Reputation: 1879

pandas groupby replace based on condition

I have a dataset structures as below:

index country  city     Data
0     AU       Sydney   23
1     AU       Sydney   45
2     AU       Unknown  2
3     CA       Toronto  56
4     CA       Toronto  2
5     CA       Ottawa   1
6     CA       Unknown  2

I want to replace 'Unknown' in the city column with the mode of the occurences of cities per country. The result would be:

...
2     AU       Sydney  2
...
6     CA       Toronto  2

I can get the city modes with:

city_modes = df.groupby('country')['city'].apply(lambda x: x.mode().iloc[0])

And I can replace values with:

df['column']=df.column.replace('Unknown', 'something')

But i cant work out how to combine these to only replace unknowns for each country based on mode of occurrence of cities.

Any ideas?

Upvotes: 2

Views: 1577

Answers (1)

jezrael
jezrael

Reputation: 862441

Use transform for Series with same size as original DataFrame and set new values by numpy.where:

city_modes = df.groupby('country')['city'].transform(lambda x: x.mode().iloc[0])
df['column'] = np.where(df['column'] == 'Unknown',city_modes, df['column'])

Or:

df.loc[df['column'] == 'Unknown', 'column'] = city_modes 

Upvotes: 3

Related Questions