dfust
dfust

Reputation: 55

Pandas: Remove index entry (and all it's rows) from multilevel index when all data in a column is NaN

I'd like to clean up some data I have in a dataframe with a multilevel index.

                | A   | B   | 
----------------+-----+-----+
foo  2019-01-01 | x   | NaN |
     2019-01-02 | x   | NaN |
     2019-01-03 | NaN | NaN |
................+.....+.....+
bar  2019-01-01 | NaN | x   |
     2019-01-02 | NaN | y   |
     2019-01-03 | NaN | z   |
................+.....+.....+
baz  2019-01-01 | x   | x   |
     2019-01-02 | x   | x   |
     2019-01-03 | x   | x   |

I'd like to loose the complete group indexed by bar, because all of the data in column A is NaN. I'd like to keep foo, because only some of the data in column A is NaN (column B is not important here, even if it's all NaN). I'd like to keep baz, because not all of column Ais NaN. So my result should look like this:

                | A   | B   | 
----------------+-----+-----+
foo  2019-01-01 | x   | NaN |
     2019-01-02 | x   | NaN |
     2019-01-03 | NaN | NaN |
................+.....+.....+
baz  2019-01-01 | x   | x   |
     2019-01-02 | x   | x   |
     2019-01-03 | x   | x   |

What's the best way to do this with pandas and python? I suppose there is a better way than looping through the data...

Upvotes: 4

Views: 120

Answers (1)

Erfan
Erfan

Reputation: 42916

groupby.transform, notna() & any()

We can groupby on your first level index and then check if any of the values in column A are not NaN.

We use transform to get the same shaped boolean array back so we can use boolean indexing to filter out the correct rows.

m = df['A'].notna().groupby(level=0).transform('any')
df[m]
                  A    B
idx idx2                
foo 2019-01-01    x  NaN
    2019-01-02    x  NaN
    2019-01-03  NaN  NaN
baz 2019-01-01    x    x
    2019-01-02    x    x
    2019-01-03    x    x

What does m return?

m = df['A'].notna().groupby(level=0).transform('any')
print(m)

idx  idx2      
foo  2019-01-01     True
     2019-01-02     True
     2019-01-03     True
bar  2019-01-01    False
     2019-01-02    False
     2019-01-03    False
baz  2019-01-01     True
     2019-01-02     True
     2019-01-03     True
Name: A, dtype: bool

Upvotes: 2

Related Questions