Reputation: 1235
In df below there are three groups in the variable 'group' - 'A', 'AB', 'C'. The other columns in the df is assigned to a specific group by suffix - var1_A relates to group A and so forth.
data = pd.DataFrame({'group':['A', 'AB', 'A', 'AB', 'AB', 'C', 'C', 'A', 'A', 'AB'],
'var1_A':['pass', 'fail', 'pass','fail', 'pass']*2,
'var2_A':['pass', 'pass', 'pass','fail', 'pass']*2,
'var1_AB':['pass', 'pass', 'pass','fail', 'pass']*2,
'var2_AB':['pass', 'pass', 'fail','fail', 'pass']*2,
'var1_C':['pass', 'pass', 'pass','fail', 'pass']*2,
'var2_C': ['fail', 'fail', 'fail','fail', 'pass']*2
})
I want for each row count the number of times 'pass' occur. For the instances that belongs to group A I only want to count the variables that are connected to the group A. I want the result in a new column. This would almost do the job.
data['new_col'] = data[data['group']=='A']['var1_A, var2_A].isin(['pass']).sum(1)
data['new_col'] = data[data['group']=='AB']['var1_AB, var2_AB].isin(['pass']).sum(1)
data['new_col'] = data[data['group']=='C']['var1_C, var2_C].isin(['pass']).sum(1)
However, I want the result in the same column from all groups. This operation is perhaps possible to do using a groupby and transform? However, I got stuck figuring it out.
Target dataframe:
pd.DataFrame({'group':['A', 'AB', 'A', 'AB', 'AB', 'C', 'C', 'A', 'A', 'AB'],
'var1_A':['pass', 'fail', 'pass','fail', 'pass']*2,
'var2_A':['pass', 'pass', 'pass','fail', 'pass']*2,
'var1_AB':['pass', 'pass', 'pass','fail', 'pass']*2,
'var2_AB':['pass', 'pass', 'fail','fail', 'pass']*2,
'var1_C':['pass', 'pass', 'pass','fail', 'pass']*2,
'var2_C': ['fail', 'fail', 'fail','fail', 'pass']*2,
'result':[2,2,2,0,2,1,1,2,0,2]
})
Upvotes: 2
Views: 74
Reputation: 765
dd1=data.apply(lambda ss:data.filter(regex=".+_{}$".format(ss.group)).loc[ss.name].loc[lambda ss:ss.eq("pass")].count(),axis=1)
data["result"]=dd1
data
or pd.wide_to_long
dd1=pd.wide_to_long(data.assign(col1=data.index), stubnames=['var1','var2'],
i=['col1'], j='col2',sep='_',suffix=r'\w+').reset_index()\
.loc[lambda dd:dd.col2.eq(dd.group)].set_index("col1")
data.assign(result=dd1.var1.map({"pass":1}).add(dd1.var2.map({"pass":1}),fill_value=0).fillna(0))
out
group var1_A var2_A var1_AB var2_AB var1_C var2_C result
0 A pass pass pass pass pass fail 2
1 AB fail pass pass pass pass fail 2
2 A pass pass pass fail pass fail 2
3 AB fail fail fail fail fail fail 0
4 AB pass pass pass pass pass pass 2
5 C pass pass pass pass pass fail 1
6 C fail pass pass pass pass fail 1
7 A pass pass pass fail pass fail 2
8 A fail fail fail fail fail fail 0
9 AB pass pass pass pass pass pass 2
Upvotes: 1
Reputation: 260640
You can melt
, filter and groupby.count
:
data['result'] = (data
.rename(columns=lambda x: x.split('_')[-1]) # get only part after "_"
.reset_index().melt(['index', 'group'])
# keep only identical groups and "pass" values
.loc[lambda d: d['group'].eq(d['variable']) & d['value'].eq('pass')]
.groupby('index')['value'].count()
.reindex(data.index, fill_value=0)
)
print(data)
Or another approach using matrices and string comparisons:
df2 = data.set_index('group').eq('pass')
data['result'] = (df2.mul(df2.columns.str.extract('_(\w+)', expand=False))
.eq(df2.index, axis=0).sum(axis=1)
.to_numpy()
)
Output:
group var1_A var2_A var1_AB var2_AB var1_C var2_C result
0 A pass pass pass pass pass fail 2
1 AB fail pass pass pass pass fail 2
2 A pass pass pass fail pass fail 2
3 AB fail fail fail fail fail fail 0
4 AB pass pass pass pass pass pass 2
5 C pass pass pass pass pass fail 1
6 C fail pass pass pass pass fail 1
7 A pass pass pass fail pass fail 2
8 A fail fail fail fail fail fail 0
9 AB pass pass pass pass pass pass 2
Upvotes: 2