Reputation: 737
Would you, please help me, to group pandas dataframe by multiple conditions.
Here is how I do it in SQL:
with a as (
select high
,sum( case when qr = 1 and now = 1 then 1 else 0 end ) q1_bad
,sum( case when qr = 2 and now = 1 then 1 else 0 end ) q2_bad
from #tmp2
group by high
)
select a.high from a
where q1_bad >= 2 and q2_bad >= 2 and a.high is not null
Here is the part of the dataset:
import pandas as pd
a = pd.DataFrame()
a['client'] = range(35)
a['high'] = ['02','47','47','47','79','01','43','56','46','47','17','58','42','90','47','86','41','56',
'55','49','47','49','95','23','46','47','80','80','41','49','46','49','56','46','31']
a['qr'] = ['1','1','1','1','2','1','1','2','2','1','1','2','2',
'2','1','1','1','2','1','2','1','2','2','1','1','1','2','2','1','1',
'1','1','1','1','2']
a['now'] = ['0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0','0',
'0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0']
Thank you very much!
Upvotes: 0
Views: 212
Reputation: 23099
it's very similar, you need to define your columns ahead of the groupby then apply your operation.
assuming you have actual integers and not strings.
import numpy as np
import pandas as pd
a.assign(q1_bad = np.where((a['qr'].eq(1) & a['now'].eq(1)),1,0),
q2_bad = np.where((a['qr'].eq(2) & a['now'].eq(1)),1,0)
).groupby('high')[['q1_bad','q2_bad']].sum()
q1_bad q2_bad
high
01 0 0
02 0 0
17 0 0
23 0 0
31 0 0
41 0 0
42 0 1
43 0 0
46 0 0
47 0 0
49 1 0
55 0 0
56 0 0
58 0 0
79 0 0
80 0 0
86 0 0
90 0 0
95 0 0
for you extra where clause you can filter it one of many ways, but for ease we can add query
at the end.
a.dropna(subset='high').assign(q1_bad = np.where((a['qr'].eq(1) & a['now'].eq(1)),1,0),
q2_bad = np.where((a['qr'].eq(2) & a['now'].eq(1)),1,0)
).groupby('high')[['q1_bad','q2_bad']].sum().query('q2_bad >= 2 and q1_bad >= 2')
Upvotes: 2