Reputation: 46
I want to add a column after the agg() function with the number of non-NaNs. "len" does not distinguish between NaNs and non-Nans. In the example bellow, I am exploring the fact that np.sum and np.mean only use non-NaNs into account. But there must be a better way of reaching the same result?
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1, 1, 1, 0, 0, 0], 'b': [4, np.NaN, 2, 1, 6, np.NaN]})
df = df.groupby("a")['b'].agg([np.sum, np.mean, np.median, np.std, len])
df['count_non_nan'] = df['sum'] / df['mean']
df
Upvotes: 0
Views: 96
Reputation: 31011
Define a function counting non-null values in the source Series:
def countNotNull(grp):
return grp.notnull().sum()
Don't be misguided by the name of notnull function. It takes into account also NaN values.
Then include this function in your aggregation:
df.groupby("a")['b'].agg([np.sum, np.mean, np.median, np.std, len, countNotNull])
The result is:
sum mean median std len countNotNull
a
0 7.0 3.5 3.5 3.535534 3.0 2.0
1 6.0 3.0 3.0 1.414214 3.0 2.0
Upvotes: 0
Reputation: 37
Replace np.NaN's with 0.
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1, 1, 1, 0, 0, 0], 'b': [4, np.NaN, 2, 1, 6, np.NaN]})
df = df.replace(np.nan, 0)
df = df.groupby("a")['b'].agg([np.sum, np.mean, np.median, np.std, len])
df['count_non_nan'] = df['sum'] / df['mean']
df
print(df)
Upvotes: 1