pandas groupby with nan

Question

Suppose I have the following df:

Index |  label  | X1 | X2 
 0    |    H    | 50 | nan   
 1    |    H    | 150| nan   
 2    |    Y    | 150| 20    
 3    |    Y    | 200| nan

I want to groupby df based on label and sum the results on X1 and X2. The only caveat is that I want to make sure that if all of the values for a label is nan the final output has to be nan.

Desired results

 Index |  label | X1  | X2 
 0     |   H    | 200 | nan   
 1     |   Y    | 350 | 20

df.groupby(['label']).sum() does not provide this output that is not desirable

 Index |  label | X1  | X2 
 0     |   H    | 200 | 0   
 1     |   Y    | 350 | 20

jezrael · Accepted Answer

You can add min_count=1 parameter to GroupBy.sum:

min_count int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

df1 = df.groupby('label', as_index=False).sum(min_count=1)
print (df1)
  label   X1    X2
0     H  200   NaN
1     Y  350  20.0

pandas groupby with nan

Answers (1)

Related Questions