Python pandas: How do I drop specific levels in a hierarchical index if any column values are NaN?

Question

I want to drop entire levels (in this case, countries) in my hierarchical index if ANY of the data values for that country are NaN. So I want to go from something like this:

                          M1                   M2
country    year                                   
Arab World 2010           5.240002             NaN
           2009           NaN                  NaN
Bangladesh 2010           6.206065             3.7
           2009           5.708707             NaN
Canada     2010           7.203803             5.8
           2009           6.144833             7.0
Sweden     2010           9.123140             6.0
           2009           5.213283             6.1

to something like this:

                          M1                   M2
country    year                                   
Canada     2010           7.203803             5.8
           2009           6.144833             7.0
Sweden     2010           9.123140             6.0
           2009           5.213283             6.1

I've tried df.dropna() with the thresh option, as well as df.fillna(0) to try and make the deletion of countries easier, but both methods are built to leave levels in the DataFrame if M1 and M2 contain values, e.g. Bangladesh in 2010.

Are there any concise ways to solve this problem?

Vidhya G · Accepted Answer

Yes there is a concise and efficient way to solve this. You were on the right track with df.dropna(), just that you need to unstack your data before you apply it.

>>> print df

                       M1   M2
Country    Year               
Arab World 2009       NaN  NaN
           2010  5.240002  NaN
Bangladesh 2009  5.708707  NaN
           2010  6.206065  3.7
Canada     2009  6.144833  7.0
           2010  7.203803  5.8
Sweden     2009  5.213283  6.1
           2010  9.123140  6.0

pivot the DataFrame to make "Year" the inner most column labels

>>> df1 = df.unstack(level=-1)

remove rows with missing data

>>> df2 = df1.dropna()

reverse the unstacking

>>> print df2.stack()

                    M1   M2
Country Year               
Canada  2009  6.144833  7.0
        2010  7.203803  5.8
Sweden  2009  5.213283  6.1
        2010  9.123140  6.0

Put all this together:

>>> clean = df.unstack(level=-1).dropna().stack()

Python pandas: How do I drop specific levels in a hierarchical index if any column values are NaN?

Answers (2)

Related Questions