Reputation: 473
Suppose I have the following df:
Index | label | X1 | X2
0 | H | 50 | nan
1 | H | 150| nan
2 | Y | 150| 20
3 | Y | 200| nan
I want to groupby df based on label and sum the results on X1 and X2. The only caveat is that I want to make sure that if all of the values for a label is nan the final output has to be nan.
Desired results
Index | label | X1 | X2
0 | H | 200 | nan
1 | Y | 350 | 20
df.groupby(['label']).sum() does not provide this output that is not desirable
Index | label | X1 | X2
0 | H | 200 | 0
1 | Y | 350 | 20
Upvotes: 1
Views: 467
Reputation: 862511
You can add min_count=1
parameter to GroupBy.sum
:
min_count int, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
df1 = df.groupby('label', as_index=False).sum(min_count=1)
print (df1)
label X1 X2
0 H 200 NaN
1 Y 350 20.0
Upvotes: 4