Reputation: 10355
I have a DataFrame with MultiIndex columns:
import numpy as np
import pandas as pd
columns = pd.MultiIndex.from_arrays([['n1', 'n1', 'n2', 'n2'], ['p', 'm', 'p', 'm']])
values = [
[1, 2, 3, 4],
[np.nan, 6, 7, 8],
[np.nan, 10, np.nan, 12],
]
df = pd.DataFrame(values, columns=columns)
n1 n2
p m p m
0 1.0 2 3.0 4
1 NaN 6 7.0 8
2 NaN 10 NaN 12
Now I want to set m
to NaN
whenever p
is NaN
. Here's the result I'm looking for:
n1 n2
p m p m
0 1.0 2.0 3.0 4.0
1 NaN NaN 7.0 8.0
2 NaN NaN NaN NaN
I know how to find out where p
is NaN
, for example using
mask = df.xs('p', level=1, axis=1).isnull()
n1 n2
0 False False
1 True False
2 True True
However, I don't know how to use this mask to set the corresponding m
values in df
to NaN
.
Upvotes: 1
Views: 138
Reputation: 148870
You can stack and unstack the transposed dataframe to be able to easily select and change values, and then again stack, unstack and transpose to get it back:
df = df.T.stack(dropna=False).unstack(level=1)
df.loc[df['p'].isna(), 'm'] = np.nan
df = df.stack(dropna=False).unstack(1).T
After first line, df
is:
m p
n1 0 2.0 1.0
1 6.0 NaN
2 10.0 NaN
n2 0 4.0 3.0
1 8.0 7.0
2 12.0 NaN
And after last:
n1 n2
m p m p
0 2.0 1.0 4.0 3.0
1 NaN NaN 8.0 7.0
2 NaN NaN NaN NaN
Upvotes: 2
Reputation: 88226
You can use pd.IndexSlice
to obtain a boolean ndarray indicating whether values are NaN
or not in the p
column on level 1
and then replacing False
to NaN
, and also to replace the values in m
by multiplying the result:
x = df.loc[:, pd.IndexSlice[:,'p']].notna().replace({False:float('nan')}).values
df.loc[:, pd.IndexSlice[:,'m']] *= x
n1 n2
p m p m
0 1.0 2 3.0 4
1 NaN NaN 7.0 8
2 NaN NaN NaN NaN
Upvotes: 2