Reputation: 437
I need to calculate different mathematical operations to the different variables in dataframe. I am having data as shown below:
y x1 x2 x3
NB 1 4 2
SK 2 5 3
SK 3 6 6
NB 4 7 9
I want to group mydata with y variable and have to calculate sum(x1),max(x2).Also, I have to apply some user_defined function to x3.
And I want my grouped output with only 4 variables y,x1,x2,x3 in pandas dataframe format as shown below.
y x1 x2 x3
NB 5 7 5
SK 5 6 5
I tried some codes and i searched in different websites but i didn't get a required solution.
please anyone help me to tackle this.
Thanks in advance.
Upvotes: 2
Views: 844
Reputation: 10359
When you use .groupby
, you can aggregate with .agg
. There are certain predefined functions for use in this, but you can also apply whatever user-defined functions you want using lambda
, where the argument passed to the function is the values for that group:
from io import StringIO
import pandas as pd
data = StringIO('''y x1 x2 x3
NB 1 4 2
SK 2 5 3
SK 3 6 6
NB 4 7 9''')
def func(values):
return sum(values)/50
df = pd.read_csv(data, sep='\s+')
summaries = df.groupby('y').agg({'x1': 'sum',
'x2': 'max',
'x3': lambda vals: func(vals)})
print(summaries)
This prints:
x1 x2 x3
y
NB 5 7 0.22
SK 5 6 0.18
Upvotes: 3
Reputation: 77
df.groupby(df.index)[‘x1’].agg(lambda x: sum(x.values)
You can change the lambda for whichever operation you are performing on a given column.
Upvotes: 0