Reputation: 2436
I'm having some trouble wrangling a dataframe that looks something like this:
value
year name
2015 bob 10.0
cat NaN
2016 bob NaN
cat NaN
I want to drop those items where all the value
s for the same name
are NaN. In this case the result should be this:
value
year name
2015 bob 10.0
2016 bob NaN
All the cat
values were NaN so cat
is gone. Since bob
had one non-NaN value, it gets to stay.
Note that both the 2016
values were NaN in the input, but 2016
is still around in the output - because this rule only applies to the name
column. Ideally I'd like to be able to provide which column this applies to as a parameter.
Is this even possible? How should I do this? I'm okay with reindexing/transposing/whatever if that's needed to get the job done (only if it's necessary though!).
Upvotes: 2
Views: 69
Reputation: 153460
You can use unstack
, isnull
, all
, and stack
:
df.unstack().loc[:,~df.unstack().isnull().all()].stack(-1, dropna=False)
Or use notnull
and any
:
df.unstack().loc[:,df.unstack().notnull().any()].stack(-1, dropna=False)
Output:
value
year name
2015 bob 10.0
2016 bob NaN
Upvotes: 1
Reputation: 294338
You can use groupby
with filter
df.groupby(level='name').filter(lambda x: x.value.notnull().any())
value
year name
2015 bob 10.0
2016 bob NaN
Upvotes: 2
Reputation: 210852
In [208]: df.reset_index().sort_values('name').drop_duplicates(['value']).set_index(['year','name'])
Out[208]:
value
year name
2015 bob 10.0
2016 bob NaN
Upvotes: 1