Reputation: 450
I want to sum one column based on similarity of other column. I tried the below code, but it gives me error, and It docent bring all the column. can anyone help me please?
df ["sum"]=df.groupby(['id']).agg({'duration': sum}).reset_index()
df
df
x. y. m. n. duration id
xx. rr. 1.1. 4.4 66 2
xx. rr. 1.1. 4.4 66 2
xx. rr. 1.1 4.4 66 2
tt. uu 2.2 4.4 10 3
tt. uu 2.2 4.4 55 3
What I want is:
x. y. m. n. duration id
xx. rr. 11 4.4 sum(66+66+66) 2
tt. uu. 22. 4.4 sum(10+55) 2
Upvotes: 1
Views: 126
Reputation: 862406
If need first rows by id
use GroupBy.transform
with DataFrame.drop_duplicates
:
df["sum"] = df.groupby('id')['duration'].transform('sum')
df1 = df.drop_duplicates('id')
Or aggregate by all columns:
df2 = df.groupby(['x.','y.','m.','n.', 'id'], as_index=False)['duration'].sum()
Upvotes: 1