Reputation: 33
Working with pandas, I have a dataframe with two hierarchies A
and B
, where B can be NaN
, and I want to fill some NaNs in D
in a particular way:
In the example below, A has "B-subgroups" where there are no values at all for D (e.g. (1, 1)
), while A also has values for D in other subgroups (e.g. (1, 3)
).
Now I want to get the mean
of each subgroup (120, 90 and 75
for A==1
), find the median
of these means (90
for A==1
) and use this median to fill NaNs in the other subgroups of A==1
.
Groups like A==2
, where there are only NaNs for D, should not be filled.
Groups like A==3
, where there are some values for D but only rows with B being NaN have NaN in D, should not be filled if possible (I intend to fill these later with the mean of all values of D of their whole A groups).
Example df:
d = {'A': [1, 1, 1, 1, 1, 1, 1, 2, 3, 3, 3],
'B': [1, 2, 3, 3, 4, 5, 6, 1, 1, np.NaN, np.NaN],
'D': [np.NaN, np.NaN, 120, 120, 90, 75, np.NaN, np.NaN, 60, 50, np.NaN]}
df = pd.DataFrame(data=d)
A B D
1 1 NaN
1 2 NaN
1 3 120
1 3 120
1 4 90
1 5 75
1 6 NaN
2 1 NaN
3 1 60
3 NaN 50
3 NaN NaN
Expected result:
A B D
1 1 90
1 2 90
1 3 120
1 3 120
1 4 90
1 5 75
1 6 90
2 1 NaN
3 1 60
3 NaN 50
3 NaN NaN
With df.groupby(['A', 'B'])['D'].mean().groupby(['A']).agg('median')
or .median()
I seem to get the right values, but using
df['D'] = df['D'].fillna(
df.groupby(['A', 'B'])['D'].mean().groupby(['A']).agg('median')
)
does not seem to change any values in D.
Any help is greatly appreciated, I've been stuck on this for a while and cannot find any solution anywhere.
Upvotes: 3
Views: 286
Reputation: 42886
Your first step is correct. After that we use Series.map
to map the correct medians to each group in column A
.
Finally we use np.where
to conditionally fill in column D
if B is not NaN
:
medians = df.groupby(['A', 'B'])['D'].mean().groupby(['A']).agg('median')
df['D'] = np.where(df['B'].notna(), # if B is not NaN
df['D'].fillna(df['A'].map(medians)), # fill in the median
df['D']) # else keep the value of column D
A B D
0 1 1.00 90.00
1 1 2.00 90.00
2 1 3.00 120.00
3 1 3.00 120.00
4 1 4.00 90.00
5 1 5.00 75.00
6 1 6.00 90.00
7 2 1.00 nan
8 3 1.00 60.00
9 3 nan 50.00
10 3 nan nan
Upvotes: 3