Boolean indexing in Pandas DataFrame with MultiIndex columns

Question

I have a DataFrame with MultiIndex columns:

import numpy as np
import pandas as pd

columns = pd.MultiIndex.from_arrays([['n1', 'n1', 'n2', 'n2'], ['p', 'm', 'p', 'm']])
values = [
    [1,      2,  3,      4],
    [np.nan, 6,  7,      8],
    [np.nan, 10, np.nan, 12],
]
df = pd.DataFrame(values, columns=columns)

    n1       n2    
     p   m    p   m
0  1.0   2  3.0   4
1  NaN   6  7.0   8
2  NaN  10  NaN  12

Now I want to set m to NaN whenever p is NaN. Here's the result I'm looking for:

    n1        n2     
     p    m    p    m
0  1.0  2.0  3.0  4.0
1  NaN  NaN  7.0  8.0
2  NaN  NaN  NaN  NaN

I know how to find out where p is NaN, for example using

mask = df.xs('p', level=1, axis=1).isnull()

      n1     n2
0  False  False
1   True  False
2   True   True

However, I don't know how to use this mask to set the corresponding m values in df to NaN.

yatu · Accepted Answer

You can use pd.IndexSlice to obtain a boolean ndarray indicating whether values are NaN or not in the p column on level 1 and then replacing False to NaN, and also to replace the values in m by multiplying the result:

x = df.loc[:, pd.IndexSlice[:,'p']].notna().replace({False:float('nan')}).values
df.loc[:, pd.IndexSlice[:,'m']] *= x

       n1        n2     
     p    m    p    m
0  1.0    2  3.0    4
1  NaN  NaN  7.0    8
2  NaN  NaN  NaN  NaN

Boolean indexing in Pandas DataFrame with MultiIndex columns

Answers (2)

Related Questions