spinningarrow
spinningarrow

Reputation: 2436

Drop items with all NaN values from a pandas multi-indexed dataframe

I'm having some trouble wrangling a dataframe that looks something like this:

                    value
year        name
2015        bob     10.0
            cat     NaN
2016        bob     NaN
            cat     NaN

I want to drop those items where all the values for the same name are NaN. In this case the result should be this:

                    value
year        name
2015        bob     10.0
2016        bob     NaN

All the cat values were NaN so cat is gone. Since bob had one non-NaN value, it gets to stay.

Note that both the 2016 values were NaN in the input, but 2016 is still around in the output - because this rule only applies to the name column. Ideally I'd like to be able to provide which column this applies to as a parameter.

Is this even possible? How should I do this? I'm okay with reindexing/transposing/whatever if that's needed to get the job done (only if it's necessary though!).

Upvotes: 2

Views: 69

Answers (3)

Scott Boston
Scott Boston

Reputation: 153460

You can use unstack, isnull, all, and stack:

df.unstack().loc[:,~df.unstack().isnull().all()].stack(-1, dropna=False)

Or use notnull and any:

df.unstack().loc[:,df.unstack().notnull().any()].stack(-1, dropna=False)

Output:

              value
year name       
2015 bob    10.0
2016 bob     NaN

Upvotes: 1

piRSquared
piRSquared

Reputation: 294338

You can use groupby with filter

df.groupby(level='name').filter(lambda x: x.value.notnull().any())

           value
year name       
2015 bob    10.0
2016 bob     NaN

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210852

In [208]: df.reset_index().sort_values('name').drop_duplicates(['value']).set_index(['year','name'])
Out[208]:
           value
year name
2015 bob    10.0
2016 bob     NaN

Upvotes: 1

Related Questions