Reputation: 11905
I aggregate my Pandas dataframe: data
. Specifically, I want to get the average and sum amount
s by tuples of [origin
and type
]. For averaging and summing I tried the numpy functions below:
import numpy as np
import pandas as pd
result = data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum, pd.Series.mean]}).reset_index()
My issue is that the amount
column includes NaN
s, which causes the result
of the above code to have a lot of NaN
average and sums.
I know both pd.Series.sum
and pd.Series.mean
have skipna=True
by default, so why am I still getting NaN
s here?
I also tried this, which obviously did not work:
data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum(skipna=True), pd.Series.mean(skipna=True)]}).reset_index()
EDIT:
Upon @Korem's suggestion, I also tried to use a partial
as below:
s_na_mean = partial(pd.Series.mean, skipna = True)
data.groupby(groupbyvars).agg({'amount': [ np.nansum, s_na_mean ]}).reset_index()
but get this error:
error: 'functools.partial' object has no attribute '__name__'
Upvotes: 17
Views: 41852
Reputation: 94
It might be too late but anyways it might be useful for others.
Try apply function:
import numpy as np
import pandas as pd
def nan_agg(x):
res = {}
res['nansum'] = x.loc[ not x['amount'].isnull(), :]['amount'].sum()
res['nanmean'] = x.loc[ not x['amount'].isnull(), :]['amount'].mean()
return pd.Series(res, index=['nansum', 'nanmean'])
result = data.groupby(groupbyvars).apply(nan_agg).reset_index()
Upvotes: 3
Reputation: 11744
Use numpy's nansum and nanmean:
from numpy import nansum
from numpy import nanmean
data.groupby(groupbyvars).agg({'amount': [ nansum, nanmean]}).reset_index()
As a workaround for older version of numpy, and also a way to fix your last try:
When you do pd.Series.sum(skipna=True)
you actually call the method. If you want to use it like this you want to define a partial. So if you don't have nanmean
, let's define s_na_mean
and use that:
from functools import partial
s_na_mean = partial(pd.Series.mean, skipna = True)
Upvotes: 18