In python, the summarise (dplyr) function analogue

Question

I have a panda dataframe df and I would like group by a variable 'house' and do specific operations in three other variables: 'var1', 'var2' and 'var3'. Suposse the three variables are numeric and 'var1' taking values 1,2,3.

data = {'house':['A', 'B', 'A', 'A', 'B', 'B', 'B'], 'var1':[3, 0, 1, 3,4,5,3], 'var2':[2, 0, 5, 1,4,1,3],'var3':[4, 2, 3, 3,0,5,1]}
df = pd.DataFrame(data) 
df

Now, I would like to create 3 new variables

new_var1 = Count the times the var3 takes values == 3
new_var2 = sum var2 (simple aggregate)
new_var3 = sum var3 (simple aggregate)

If I were using the R programming language, I would do it instantly

require(dplyr)
data = data.frame('house'=c('A', 'B', 'A', 'A', 'B', 'B', 'B'), 
        'var1'=c(3, 0, 1, 3,4,5,3), 
        'var2'=c(2, 0, 5, 1,4,1,3),
        'var3'=c(4, 2, 3, 3,0,5,1))

df= data %>% group_by(house) %>% summarise(new_var1 = sum(var1 == 3),
                                       new_var2 = sum(var2),
                                       new_var2 = sum(var2))
df

In python, first, I group by

df.groupby(['house'])['var1','var2', 'var3']

But I would like to continue on the same line of code and I don't know how to do this. There is some analogue 'summarise' function in python?

fmarm · Accepted Answer

You can do this using the agg method

(df.groupby(['house']).agg({'var1': lambda x: (x==3).sum(), 
                            'var2': 'sum',
                            'var3': 'sum'})
   .rename(columns={"var1": "new_var1", 
                    "var2": "new_var2",
                    "var3":"new_var3"})
)

In python, the summarise (dplyr) function analogue

Answers (2)

Related Questions