Reputation: 533
I have a panda dataframe df and I would like group by a variable 'house' and do specific operations in three other variables: 'var1', 'var2' and 'var3'. Suposse the three variables are numeric and 'var1' taking values 1,2,3.
data = {'house':['A', 'B', 'A', 'A', 'B', 'B', 'B'], 'var1':[3, 0, 1, 3,4,5,3], 'var2':[2, 0, 5, 1,4,1,3],'var3':[4, 2, 3, 3,0,5,1]}
df = pd.DataFrame(data)
df
Now, I would like to create 3 new variables
If I were using the R programming language, I would do it instantly
require(dplyr)
data = data.frame('house'=c('A', 'B', 'A', 'A', 'B', 'B', 'B'),
'var1'=c(3, 0, 1, 3,4,5,3),
'var2'=c(2, 0, 5, 1,4,1,3),
'var3'=c(4, 2, 3, 3,0,5,1))
df= data %>% group_by(house) %>% summarise(new_var1 = sum(var1 == 3),
new_var2 = sum(var2),
new_var2 = sum(var2))
df
In python, first, I group by
df.groupby(['house'])['var1','var2', 'var3']
But I would like to continue on the same line of code and I don't know how to do this. There is some analogue 'summarise' function in python?
Upvotes: 2
Views: 635
Reputation: 3825
I have been porting data packages (dplyr
, tidyr
, tibble
, etc) from R
in python
:
https://github.com/pwwang/datar
If you are familiar with those packages in R, and want to apply it in python, then it is here for you:
from datar import f
from datar.all import *
data = tibble(
house=c('A', 'B', 'A', 'A', 'B', 'B', 'B'),
var1=c(3, 0, 1, 3,4,5,3),
var2=c(2, 0, 5, 1,4,1,3),
var3=c(4, 2, 3, 3,0,5,1)
)
df= data >> group_by(f.house) >> summarise(new_var1 = sum(f.var1 == 3),
new_var2 = sum(f.var2),
new_var3 = sum(f.var3))
print(df)
Output:
house new_var1 new_var2 new_var3
0 A 2 8 10
1 B 1 8 8
Upvotes: 0
Reputation: 4284
You can do this using the agg
method
(df.groupby(['house']).agg({'var1': lambda x: (x==3).sum(),
'var2': 'sum',
'var3': 'sum'})
.rename(columns={"var1": "new_var1",
"var2": "new_var2",
"var3":"new_var3"})
)
Upvotes: 4