Reputation: 11293
import pandas as pd
import numpy as np
d = {'l': ['left', 'right', 'left', 'right', 'left', 'right'],
'r': ['right', 'left', 'right', 'left', 'right', 'left'],
'v': [-1, 1, -1, 1, -1, np.nan]}
df = pd.DataFrame(d)
When a grouped dataframe contains a value of np.NaN
I want the grouped sum to be NaN
as is given by the skipna=False
flag for pd.Series.sum
and also pd.DataFrame.sum
however, this
In [235]: df.v.sum(skipna=False)
Out[235]: nan
However, this behavior is not reflected in the pandas.DataFrame.groupby
object
In [237]: df.groupby('l')['v'].sum()['right']
Out[237]: 2.0
and cannot be forced by applying the np.sum
method directly
In [238]: df.groupby('l')['v'].apply(np.sum)['right']
Out[238]: 2.0
I can workaround this by doing
check_cols = ['v']
df['flag'] = df[check_cols].isnull().any(axis=1)
df.groupby('l')['v', 'flag'].apply(np.sum).apply(
lambda x: x if not x.flag else np.nan,
axis=1
)
but this is ugly. Is there a better method?
Upvotes: 13
Views: 7035
Reputation: 1
Alexis' answer is great but maybe it could be better with :
no_skipna_sum = lambda x: pd.core.series.Series.sum(x, skipna=False)
It gives more flexibility and can be used with the syntax
df.groupby(col).agg(agg_col_name = (col_to_agg, no_skipna_sum))
Upvotes: 0
Reputation: 18668
I think it's inherent to pandas. A workaround can be :
df.groupby('l')['v'].apply(array).apply(sum)
to mimic the numpy way,
or
df.groupby('l')['v'].apply(pd.Series.sum,skipna=False) # for series, or
df.groupby('l')['v'].apply(pd.DataFrame.sum,skipna=False) # for dataframes.
to call the good function.
Upvotes: 9
Reputation: 50220
I'm not sure where this falls on the ugliness scale, but it works:
>>> series_sum = pd.core.series.Series.sum
>>> df.groupby('l')['v'].agg(series_sum, skipna=False)
l
left -3
right NaN
Name: v, dtype: float64
I just dug up the sum
method you used when you took df.v.sum
, which supports the skipna
option:
>>> help(df.v.sum)
Help on method sum in module pandas.core.generic:
sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs) method
of pandas.core.series.Series instance
Upvotes: 4
Reputation: 210932
Is that what you want?
In [24]: df.groupby('l')['v'].agg(lambda x: np.nan if x.isnull().any() else x.sum())
Out[24]:
l
left -3.0
right NaN
Name: v, dtype: float64
or
In [22]: df.groupby('l')['v'].agg(lambda x: x.sum() if x.notnull().all() else np.nan)
Out[22]:
l
left -3.0
right NaN
Name: v, dtype: float64
Upvotes: 1