Matthi9000
Matthi9000

Reputation: 1237

Pandas sum of two columns - dealing with nan-values correctly

When summing two pandas columns, I want to ignore nan-values when one of the two columns is a float. However when nan appears in both columns, I want to keep nan in the output (instead of 0.0).

Initial dataframe:

Surf1     Surf2
0         0
NaN       8
8         15
NaN       NaN
16        14
15        7

Desired output:

Surf1     Surf2     Sum
0         0         0
NaN       8         8
8         15        23
NaN       NaN       NaN
16        14        30
15        7         22

Tried code: -> the code below ignores nan-values but when taking the sum of two nan-values, it gives 0.0 in the output where I want to keep it as NaN in that particular case to keep these empty values separate from values that are actually 0 after summing.

import pandas as pd
import numpy as np

data = pd.DataFrame({"Surf1": [10,np.nan,8,np.nan,16,15], "Surf2": [22,8,15,np.nan,14,7]})
print(data)

data.loc[:,'Sum'] = data.loc[:,['Surf1','Surf2']].sum(axis=1)
print(data)

Upvotes: 5

Views: 13824

Answers (5)

Vahan Sargsyan
Vahan Sargsyan

Reputation: 28

I think All the solutions listed above work only for the cases when when it is the FIRST column value that is missing. If you have cases when the first column value is non-missing but the second column value is missing, try using:

df['sum'] = df['Surf1']

df.loc[(df['Surf2'].notnull()), 'sum'] = df['Surf1'].fillna(0) + df['Surf2']

Upvotes: 1

BENY
BENY

Reputation: 323346

You can use min_count, this will sum all the row when there is at least on not null, if all null return null

df['SUM']=df.sum(min_count=1,axis=1)
#df.sum(min_count=1,axis=1)
Out[199]: 
0     0.0
1     8.0
2    23.0
3     NaN
4    30.0
5    22.0
dtype: float64

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150785

You can do:

df['Sum'] = df.dropna(how='all').sum(1)

Output:

   Surf1  Surf2   Sum
0   10.0   22.0  32.0
1    NaN    8.0   8.0
2    8.0   15.0  23.0
3    NaN    NaN   NaN
4   16.0   14.0  30.0
5   15.0    7.0  22.0

Upvotes: 3

SubOptimal
SubOptimal

Reputation: 22963

From the documentation pandas.DataFrame.sum

By default, the sum of an empty or all-NA Series is 0.

>>> pd.Series([]).sum() # min_count=0 is the default 0.0

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.

Change your code to

data.loc[:,'Sum'] = data.loc[:,['Surf1','Surf2']].sum(axis=1, min_count=1)

output

   Surf1  Surf2
0   10.0   22.0
1    NaN    8.0
2    8.0   15.0
3    NaN    NaN
4   16.0   14.0
5   15.0    7.0
   Surf1  Surf2   Sum
0   10.0   22.0  32.0
1    NaN    8.0   8.0
2    8.0   15.0  23.0
3    NaN    NaN   NaN
4   16.0   14.0  30.0
5   15.0    7.0  22.0

Upvotes: 12

yatu
yatu

Reputation: 88285

You could mask the result by doing:

df.sum(1).mask(df.isna().all(1))

0     0.0
1     8.0
2    23.0
3     NaN
4    30.0
5    22.0
dtype: float64

Upvotes: 3

Related Questions