Reputation: 55
I'd like to clean up some data I have in a dataframe with a multilevel index.
| A | B |
----------------+-----+-----+
foo 2019-01-01 | x | NaN |
2019-01-02 | x | NaN |
2019-01-03 | NaN | NaN |
................+.....+.....+
bar 2019-01-01 | NaN | x |
2019-01-02 | NaN | y |
2019-01-03 | NaN | z |
................+.....+.....+
baz 2019-01-01 | x | x |
2019-01-02 | x | x |
2019-01-03 | x | x |
I'd like to loose the complete group indexed by bar, because all of the data in column A is NaN. I'd like to keep foo, because only some of the data in column A is NaN (column B is not important here, even if it's all NaN). I'd like to keep baz, because not all of column Ais NaN. So my result should look like this:
| A | B |
----------------+-----+-----+
foo 2019-01-01 | x | NaN |
2019-01-02 | x | NaN |
2019-01-03 | NaN | NaN |
................+.....+.....+
baz 2019-01-01 | x | x |
2019-01-02 | x | x |
2019-01-03 | x | x |
What's the best way to do this with pandas and python? I suppose there is a better way than looping through the data...
Upvotes: 4
Views: 120
Reputation: 42916
groupby.transform
, notna()
& any()
We can groupby
on your first level index and then check if any of the values in column A are not NaN
.
We use transform
to get the same shaped boolean array back so we can use boolean indexing
to filter out the correct rows.
m = df['A'].notna().groupby(level=0).transform('any')
df[m]
A B
idx idx2
foo 2019-01-01 x NaN
2019-01-02 x NaN
2019-01-03 NaN NaN
baz 2019-01-01 x x
2019-01-02 x x
2019-01-03 x x
What does m
return?
m = df['A'].notna().groupby(level=0).transform('any')
print(m)
idx idx2
foo 2019-01-01 True
2019-01-02 True
2019-01-03 True
bar 2019-01-01 False
2019-01-02 False
2019-01-03 False
baz 2019-01-01 True
2019-01-02 True
2019-01-03 True
Name: A, dtype: bool
Upvotes: 2