Reputation: 877
I have a Pandas Dataframe like this:
df =
a b
a1 b1
a1 b2
a1 b1
a1 Nan
a2 b1
a2 b2
a2 b2
a2 Nan
a2 b2
a3 Nan
For every value of a
, b
can have multiple values of b
corresponding to it. I want to fill up all the nan
values of b
with the mode of b
value grouped by the corresponding value of a
.
The resulting dataframe should look like the following:
df =
a b
a1 b1
a1 b2
a1 b1
a1 ***b1***
a2 b1
a2 b2
a2 b2
a2 **b2**
a2 b2
a3 b2
Above b1
was the mode of b
corresponding to a1
. Similarly, b2
was the mode corresponding to a2
. Finally, a3 had no data, so it fills it by global mode b2
.
For every nan value of column b, I want to fill it with the mode of the value of b column, but, for that particular value of a, whatever is the mode.
EDIT:
If there is a group a
for which there is no data on b
, then fill it by global mode.
Upvotes: 2
Views: 2128
Reputation: 61
You are getting the IndexError: index out of bounds because last a column value a3
does not have corresponding b column value. Hence there is no group to fill. Solution would be have try catch block while fillna and then apply ffill and bfill
. Here is the code solution.
data_stack = [['a1','b1'],['a1','b2'],['a1','b1'],['a1',np.nan],['a2','b1'],
['a2','b2'],['a2','b2'],['a2',np.nan],['a2','b2'],['a3',np.nan]]
df_try_stack = pd.DataFrame(data_stack, columns=["a","b"])
# This function will fill na values of group to the mode value
def fillna_group(grp):
try:
return grp.fillna(grp.mode()[0])
except BaseException as e:
print('Error as no correspindg group: ' + str(e))
df_try_stack["b"] = df_try_stack["b"].fillna(df_try_stack.groupby(["a"])
['b'].transform(lambda grp : fillna_group(grp)))
df_try_stack = df_try_stack.ffill(axis = 0)
df_try_stack = df_try_stack.bfill(axis =0)
Upvotes: 0
Reputation: 150785
Try:
# lazy grouping
groups = df.groupby('a')
# where all the rows within a group is NaN
all_na = groups['b'].transform(lambda x: x.isna().all())
# fill global mode
df.loc[all_na, 'b'] = df['b'].mode()[0]
# fill with local mode
mode_by_group = groups['b'].transform(lambda x: x.mode()[0])
df['b'] = df['b'].fillna(mod_by_group)
Upvotes: 3