Sum one column based on similarity of other column values in Python

Question

I want to sum one column based on similarity of other column. I tried the below code, but it gives me error, and It docent bring all the column. can anyone help me please?

df ["sum"]=df.groupby(['id']).agg({'duration': sum}).reset_index()
df

df


x.     y.    m.      n.       duration      id
xx.    rr.   1.1.   4.4        66            2
xx.    rr.   1.1.   4.4        66            2
xx.    rr.   1.1    4.4        66            2
tt.    uu    2.2    4.4        10            3
tt.    uu    2.2    4.4        55            3

What I want is:

x.     y.    m.      n.       duration           id
xx.    rr.   11     4.4        sum(66+66+66)      2
tt.    uu.   22.    4.4        sum(10+55)         2

jezrael · Accepted Answer

If need first rows by id use GroupBy.transform with DataFrame.drop_duplicates:

df["sum"] = df.groupby('id')['duration'].transform('sum')
df1 = df.drop_duplicates('id')

Or aggregate by all columns:

df2 = df.groupby(['x.','y.','m.','n.', 'id'], as_index=False)['duration'].sum()

Sum one column based on similarity of other column values in Python

Answers (1)

Related Questions