Python Dataframe how to sum row values with groupby

Question

I'm trying to group a column 'Over_Id' in Dataframe and also sum values of column runs_scored while grouping.

If I use groupby, I loose my other columns

Eg:

ball.groupby(['Match_Id','Innings_Id','Over_Id'])['runs_scored'].sum()

I was able to get my runs_scored column, but in a new Dataframe, not my actual as seen in the image. I can't merge, because my addition of runs_scored column is based on 3 columns.

In short, I want only 1 entry for each Over_Id and it's corresponding runs_scored.

How can I do that?

cs95 · Accepted Answer

You could just group by every column besides the runs_scored column, and then find the sum.

c = df.columns.difference(['runs_scored']).tolist()
df = df.groupby(c, as_index=False).runs_scored.sum()

On a side note, it seems you have a lot of redundant data entries. Have you looked at normalising your tables?

Python Dataframe how to sum row values with groupby

Answers (1)

Related Questions