Alex Ivanov
Alex Ivanov

Reputation: 737

pandas: groupby with multiple conditions

Would you, please help me, to group pandas dataframe by multiple conditions.

Here is how I do it in SQL:

with a as (
  select high 
,sum( case when qr = 1 and now = 1 then 1 else 0 end ) q1_bad
,sum( case when qr = 2 and now = 1 then 1 else 0 end ) q2_bad
  from #tmp2
  group by high
)
select a.high from a
where q1_bad >= 2 and q2_bad >= 2 and a.high is not null

Here is the part of the dataset:

import pandas as pd
a = pd.DataFrame()

a['client'] = range(35)
a['high'] = ['02','47','47','47','79','01','43','56','46','47','17','58','42','90','47','86','41','56',
'55','49','47','49','95','23','46','47','80','80','41','49','46','49','56','46','31']
a['qr'] = ['1','1','1','1','2','1','1','2','2','1','1','2','2',
'2','1','1','1','2','1','2','1','2','2','1','1','1','2','2','1','1',
'1','1','1','1','2']
a['now'] = ['0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0','0',
'0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0']

Thank you very much!

Upvotes: 0

Views: 212

Answers (1)

Umar.H
Umar.H

Reputation: 23099

it's very similar, you need to define your columns ahead of the groupby then apply your operation.

assuming you have actual integers and not strings.

import numpy as np
import pandas as pd
a.assign(q1_bad = np.where((a['qr'].eq(1) & a['now'].eq(1)),1,0),
         q2_bad = np.where((a['qr'].eq(2) & a['now'].eq(1)),1,0)

).groupby('high')[['q1_bad','q2_bad']].sum()

     q1_bad  q2_bad
high                
01         0       0
02         0       0
17         0       0
23         0       0
31         0       0
41         0       0
42         0       1
43         0       0
46         0       0
47         0       0
49         1       0
55         0       0
56         0       0
58         0       0
79         0       0
80         0       0
86         0       0
90         0       0
95         0       0

for you extra where clause you can filter it one of many ways, but for ease we can add query at the end.

a.dropna(subset='high').assign(q1_bad = np.where((a['qr'].eq(1) & a['now'].eq(1)),1,0),
         q2_bad = np.where((a['qr'].eq(2) & a['now'].eq(1)),1,0)

).groupby('high')[['q1_bad','q2_bad']].sum().query('q2_bad >= 2 and q1_bad >= 2')

Upvotes: 2

Related Questions