sum values in column grouped by another column pandas

Question

My df looks like this:

country   id       x       y
AT        11      50     100
AT        12      NaN     90
AT        13      NaN    104
AT        22      40      50
AT        23      30      23
AT        61      40      88
AT        62      NaN     78  
UK        11      40      34
UK        12      NaN     22
UK        13      NaN     70

What I need is the sum of the y column in the first row that is not NaN in x, grouped by the first number on the left of the column id. This separately for each country. At the end I just need to drop the NaN.

The result should be something like this:

country   id       x       y
AT        11      50     294
AT        22      40      50
AT        23      30      23
AT        61      40     166
UK        11      40      126

Henry Yik · Accepted Answer

Use groupby, transform and dropna:

print (df.assign(y=df.groupby(df["x"].notnull().cumsum())["y"].transform('sum'))
         .dropna(subset=["x"]))

  country  id     x    y
0      AT  11  50.0  294
3      AT  22  40.0   50
4      AT  23  30.0   23
5      AT  61  40.0  166
7      UK  11  40.0  126

sum values in column grouped by another column pandas

Answers (2)

Related Questions