Bluetail
Bluetail

Reputation: 1291

How to sum up missing values per row in pandas dataframe

I have a dataframe:

import pandas as pd

d = {
'Country': ["Austria", "Austria", "Belgium", "USA", "USA", "USA", "USA"], 
'Number2020': [15, None, 18, 20, 22, None,  30],
'Number2021': [20, 25, 18, None, None, None, 32],    
}

df = pd.DataFrame(data=d)
df

    Country   Number2020    Number2021
0   Austria   15.0          20.0
1   Austria   NaN           25.0
2   Belgium   18.0          18.0
3   USA       20.0          NaN
4   USA       22.0          NaN
5   USA       NaN           NaN
6   USA       30.0          32.0

and I want to sum up the nan values per each country. E.g.

    Country       Count_nans
    Austria       1
    
    USA           4

I have filtered the dataframe to leave only the rows with nans .

df_nan = df[df.Number2021.isna() | df.Number2020.isna()]

    Country   Number2020    Number2021
1   Austria   NaN           25.0
3   USA       20.0          NaN
4   USA       22.0          NaN
5   USA       NaN           NaN

So it looks like a groupby operation? I have tried this.

nasum2021 = df_nan['Number2021'].isna().sum()

df_nan['countNames2021'] = df_nan.groupby(['Number2021'])['Number2021'].transform('count').fillna(nasum2021)
df_nan

It gives me 1 nan for Austria but 3 for the United States while it should be 4. so that is not right. In my real dataframe, I have some 10 years and around 30 countries. thank you!

Upvotes: 3

Views: 2865

Answers (4)

keramat
keramat

Reputation: 4543

Use:

df.groupby('Country').apply(lambda x: x.isna().sum().sum())

Output:

enter image description here

Upvotes: 0

Sash Sinha
Sash Sinha

Reputation: 22360

You could use pandas.DataFrame.agg along with pandas.DataFrame.isna:

>>> df.groupby('Country').agg(lambda x: x.isna().sum()).sum(axis=1)
Country
Austria    1
Belgium    0
USA        4
dtype: int64

Upvotes: 1

jezrael
jezrael

Reputation: 862611

Solution for processing all columns without Country - first convert it to index, test missing values and aggregate sum, last sum columns:

s = df.set_index('Country').isna().groupby('Country').sum().sum(axis=1)
print (s)
Country
Austria    1
Belgium    0
USA        4
dtype: int64

If need remove 0 values add boolean indexing:

s = s[s.ne(0)]

Upvotes: 4

mozway
mozway

Reputation: 260570

You could use:

df.filter(like='Number').isna().sum(1).groupby(df['Country']).sum()

output:

Country
Austria    1
Belgium    0
USA        4
dtype: int64

or, filtering the rows with NaN first to only count the countries with at least 1 NaN:

df[df.filter(like='Number').isna().any(1)].groupby('Country')['Country'].count()

output:

Country
Austria    1
USA        3
Name: Country, dtype: int64

Upvotes: 2

Related Questions