Vishwa Mittar
Vishwa Mittar

Reputation: 388

GroupBy pandas DataFrame and select most common value which is alphabetically first

I have a data frame with multiple columns but currently, I am focusing on province and city columns. The problem is I want to fill NaN values in the city column by the most frequent city value in the same province, but if the value count of any city is the same then it should fill with the one which is alphabetically first. I followed an answer from this link and modified it, but values are not being sorted alphabetically for some reason.

Here's my code:

df.groupby(['province'])['city'].agg(lambda x:x.value_counts().sort_values(ascending=False).index[0])

what do I need to change?

Sample DataFrame:

{'province': {0: 'A',
  1: 'A',
  2: 'A',
  3: 'A',
  4: 'A',
  5: 'A',
  6: 'A',
  7: 'B',
  8: 'B',
  9: 'B',
  10: 'B',
  11: 'B',
  12: 'B',
  13: 'B'},
 'city': {0: 'newyork',
  1: nan,
  2: 'newyork',
  3: 'london',
  4: 'london',
  5: nan,
  6: 'houston',
  7: 'hyderabad',
  8: 'karachi',
  9: nan,
  10: 'hyderabad',
  11: nan,
  12: 'beijing',
  13: 'karachi'}}

Required Output:

{'province': {0: 'A',
  1: 'A',
  2: 'A',
  3: 'A',
  4: 'A',
  5: 'A',
  6: 'A',
  7: 'B',
  8: 'B',
  9: 'B',
  10: 'B',
  11: 'B',
  12: 'B',
  13: 'B'},
 'city': {0: 'newyork',
  1: 'london',
  2: 'newyork',
  3: 'london',
  4: 'london',
  5: 'london',
  6: 'houston',
  7: 'hyderabad',
  8: 'karachi',
  9: 'hyderabad',
  10: 'hyderabad',
  11: 'hyderabad',
  12: 'beijing',
  13: 'karachi'}}

Upvotes: 0

Views: 676

Answers (1)

not_speshal
not_speshal

Reputation: 23146

Try with groupby and mode:

mapper = df.groupby("province")["city"].agg(lambda x: x.mode().sort_values()[0]).to_dict()
df["city"] = df["city"].where(df["city"].notnull(),
                              df["province"].map(mapper))

>>> df
   province       city
0         A    newyork
1         A     london
2         A    newyork
3         A     london
4         A     london
5         A     london
6         A    houston
7         B  hyderabad
8         B    karachi
9         B  hyderabad
10        B  hyderabad
11        B  hyderabad
12        B    beijing
13        B    karachi

Upvotes: 1

Related Questions