SM_Erd
SM_Erd

Reputation: 57

pandas groupby - custom function

I have the following dataframe to which I use groupby and sum():

d = {'col1': ["A", "A", "A", "B", "B", "B", "C", "C","C"], 'col2': [1,2,3,4,5,6, np.nan, np.nan, np.nan]}

df = pd.DataFrame(data=d)

df.groupby("col1").sum()

This results in the following:

col1 col2   
A   6.0
B   15.0
C   0.0

I want C to show NaN instead of 0 since all of the values for C are NaN. How can I accomplish this? Apply() with a lambda function? Any help would be appreciated.

Upvotes: 3

Views: 170

Answers (3)

Scott Boston
Scott Boston

Reputation: 153460

Thanks to @piRSquared, @Alollz, and @anky_91:

You can use without setting index and reset index:

d = {'col1': ["A", "A", "A", "B", "B", "B", "C", "C","C"], 'col2': [1,2,3,4,5,6, np.nan, np.nan, np.nan]}

df = pd.DataFrame(data=d)

df.groupby("col1", as_index=False).sum(min_count=1)

Output:

  col1  col2
0    A   6.0
1    B  15.0
2    C   NaN

Upvotes: 2

anky
anky

Reputation: 75080

Use this:

df.groupby('col1').apply(pd.DataFrame.sum,skipna=False).reset_index(drop=True)
#Or --> df.groupby('col1',as_index=False).apply(pd.DataFrame.sum,skipna=False)

Without the apply() thanks to @piRSquared:

df.set_index('col1').sum(level=0, min_count=1).reset_index()

thanks @Alollz : If you want to return sum of groups containing NaN and not just NaNs

df.set_index('col1').sum(level=0,min_count=1).reset_index()

Output

  col1  col2
0  AAA   6.0
1  BBB  15.0
2  CCC   NaN

Upvotes: 3

bravosierra99
bravosierra99

Reputation: 1371

make the call to sum have the parameter skipna = False.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sum.html

that link should provide the documentation you need and I expect that will fix your problem.

Upvotes: 1

Related Questions