How to fill na in pandas by the mode of a group

Question

I have a Pandas Dataframe like this:

  df = 

       a                    b
       a1                   b1
       a1                   b2
       a1                   b1
       a1                   Nan
       a2                   b1
       a2                   b2
       a2                   b2
       a2                   Nan
       a2                   b2
       a3                   Nan

For every value of a, b can have multiple values of b corresponding to it. I want to fill up all the nan values of b with the mode of b value grouped by the corresponding value of a.

The resulting dataframe should look like the following:

  df = 

       a                    b
       a1                   b1
       a1                   b2
       a1                   b1
       a1                   ***b1***
       a2                   b1
       a2                   b2
       a2                   b2
       a2                   **b2**
       a2                   b2
       a3                   b2

Above b1 was the mode of b corresponding to a1. Similarly, b2 was the mode corresponding to a2. Finally, a3 had no data, so it fills it by global mode b2.

For every nan value of column b, I want to fill it with the mode of the value of b column, but, for that particular value of a, whatever is the mode.

EDIT:

If there is a group a for which there is no data on b, then fill it by global mode.

Quang Hoang · Accepted Answer

Try:

# lazy grouping
groups = df.groupby('a')

# where all the rows within a group is NaN
all_na = groups['b'].transform(lambda x: x.isna().all())

# fill global mode
df.loc[all_na, 'b'] = df['b'].mode()[0]

# fill with local mode
mode_by_group = groups['b'].transform(lambda x: x.mode()[0])
df['b'] = df['b'].fillna(mod_by_group)

How to fill na in pandas by the mode of a group

Answers (2)

Related Questions