Pythonist
Pythonist

Reputation: 697

Sum certain values between rows

I have a data frame of 370000 rows x 340 columns, in this DataFrame I have mixed numeric and non-numeric values between columns.

I'd like to merge rows, basing on user_id, that certain numeric values are summed, and non-numeric values are preserved.

Given table:

user_id          points    gender    age
11111              10000      male    18
 2222               4200    female    33
11111              30000      male    18
 2222                 10    female    33
11111              20000      male    18

I'd expect this result:

user_id          points    gender     age
11111              60000      male     18
2222                4210      female   33

While summing rows, I need to sum approx. 300 values out of 340, but I'm new to pandas struggling to find the right way to do this, would appreciate any tips.

Upvotes: 1

Views: 90

Answers (2)

ALollz
ALollz

Reputation: 59569

You want to group your original DataFrame by 'user_id'. Then you can aggregate the grouped frame and specify a dictionary for which function to use on each column.

df.groupby('user_id').agg({'points': 'sum', 
                           'gender': 'first', 
                           'age': 'first'}).reset_index()

Outputs:

   user_id  points  gender  age
0     2222    4210  female   33
1    11111   60000    male   18

First will work as long as there is only one unique value in the columns where you want to 'preserve' the value. Otherwise implement mode to keep the most common value, or 'unique' to get a list of all values.

Upvotes: 3

Onk_r
Onk_r

Reputation: 846

Hey you can do that using pandas aggregate sum

like

df.groupby(['user_id'])[["points"]].sum()

df.groupby(['user_id'])[["list_of_attribute"]].sum()

And its also possible to pass multiple columns in list_of_attributes but that would have to subset of columns otherwise KeyError.

Upvotes: 1

Related Questions