learner
learner

Reputation: 877

How to fill na in pandas by the mode of a group

I have a Pandas Dataframe like this:

  df = 

       a                    b
       a1                   b1
       a1                   b2
       a1                   b1
       a1                   Nan
       a2                   b1
       a2                   b2
       a2                   b2
       a2                   Nan
       a2                   b2
       a3                   Nan

For every value of a, b can have multiple values of b corresponding to it. I want to fill up all the nan values of b with the mode of b value grouped by the corresponding value of a.

The resulting dataframe should look like the following:

  df = 

       a                    b
       a1                   b1
       a1                   b2
       a1                   b1
       a1                   ***b1***
       a2                   b1
       a2                   b2
       a2                   b2
       a2                   **b2**
       a2                   b2
       a3                   b2

Above b1 was the mode of b corresponding to a1. Similarly, b2 was the mode corresponding to a2. Finally, a3 had no data, so it fills it by global mode b2.

For every nan value of column b, I want to fill it with the mode of the value of b column, but, for that particular value of a, whatever is the mode.

EDIT:

If there is a group a for which there is no data on b, then fill it by global mode.

Upvotes: 2

Views: 2128

Answers (2)

Golden Feather
Golden Feather

Reputation: 61

You are getting the IndexError: index out of bounds because last a column value a3 does not have corresponding b column value. Hence there is no group to fill. Solution would be have try catch block while fillna and then apply ffill and bfill . Here is the code solution.

data_stack = [['a1','b1'],['a1','b2'],['a1','b1'],['a1',np.nan],['a2','b1'], 
['a2','b2'],['a2','b2'],['a2',np.nan],['a2','b2'],['a3',np.nan]]
df_try_stack = pd.DataFrame(data_stack, columns=["a","b"])

# This function will fill na values of group to the mode value
def fillna_group(grp):
    try:
        return grp.fillna(grp.mode()[0])
    except BaseException as e:
        print('Error as no correspindg group: ' + str(e))
df_try_stack["b"] = df_try_stack["b"].fillna(df_try_stack.groupby(["a"]) 
['b'].transform(lambda grp : fillna_group(grp)))
df_try_stack = df_try_stack.ffill(axis = 0)
df_try_stack = df_try_stack.bfill(axis =0)

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150785

Try:

# lazy grouping
groups = df.groupby('a')

# where all the rows within a group is NaN
all_na = groups['b'].transform(lambda x: x.isna().all())

# fill global mode
df.loc[all_na, 'b'] = df['b'].mode()[0]

# fill with local mode
mode_by_group = groups['b'].transform(lambda x: x.mode()[0])
df['b'] = df['b'].fillna(mod_by_group)

Upvotes: 3

Related Questions