Reputation: 15
I hope I can explain my problem correctly.. I have a dataframe (306x40) with multiple rows containing data of certain group, I need to group them by index, that's no problem. Next I need to compare the rows with another row that has a specific condition. Take this data as an example:
Id Condition var1 var2 var3
1 1 0 1 0
1 3 1 1 0
2 2 0 0 1
2 3 0 0 1
2 1 0 0 1
So I need to compare var1, var2, var3 per Id with the row that has condition 3.. The outcome would let me know that any of the variables is different than the variables in condition 3. Possible outputs:
Id Condition var1 var2 var3 count_false
1 1 false true true 1
1 3 NaN NaN NaN NaN
2 2 true true true 0
2 3 NaN NaN NaN NaN
2 1 true true true 0
Or just simply saying that condition x is not the same as condition 3 for y Id
I hope you guys understand what I need, but I'm happy to elaborate
Thank you so much in advance!
edit to make it more clear:
The data consists of data from 3 different methods to detect bacteria in patients (clinical data), note that per patient it differs which method was used and how many times. So I have method 1, 2, 3, these are the different conditions. The variables are the different kinds of bacteria found. Method 1 and 2 are the golden standard and method 3 is the one that needs to be validated. So I want to see if method 3 gives the same result as method 1 or 2. the 30+ variables are the bacteria and value 1 indicates present bacteria and 0 not present.
Upvotes: 0
Views: 2025
Reputation: 16
#x!=x.loc[i] gives you the truth value of each row when comparing with your desired(condition) row. Rest of the code is below:
df=pd.DataFrame([[1,1,0,1,0],[1,3,1,1,0],[2,2, 0,0,1],[2,3, 0,0,1],[2,1, 0,0,1]],columns=['ID','Condition','var1','var2','var3'])
def comp(x):
i=x.index[x['Condition']==3][0]
#print(x!=x.loc[i])
truth=(x[['var1','var2','var3']]==x.loc[i,['var1','var2','var3']])
x=pd.concat([x[['ID','Condition']],truth],axis=1)
return x
grp=df.groupby('ID').apply(comp)
grp['count_false']=grp[['var1','var2','var3']].apply(lambda x: 3-x.sum(),axis=1)
print(grp)
Upvotes: 0
Reputation: 26221
If I understand correctly, this is just about counting how many var1 .. var2
are different than var3
, row by row. There is no grouping involved.
If that's the case (happy to modify if not), then:
cols = 'var1 var2 var3'.split() # can be extended to more variables
df['count_diff'] = (df[cols[:-1]].values != df[cols[-1]].values[:, None]).sum(axis=1)
On a modified example:
df = pd.DataFrame({
'Id': [1, 1, 2, 2, 2],
'Condition': [1, 3, 2, 3, 1],
'var1': [0, 1, 0, 1, 0],
'var2': [1, 1, 0, 1, 0],
'var3': [0, 0, 1, 1, 1]})
cols = 'var1 var2 var3'.split()
df['count_diff'] = (df[cols[:-1]].values != df[cols[-1]].values[:, None]).sum(axis=1)
print(df)
# output:
Id Condition var1 var2 var3 count_diff
0 1 1 0 1 0 1
1 1 3 1 1 0 2
2 2 2 0 0 1 2
3 2 3 1 1 1 0
4 2 1 0 0 1 2
Upvotes: 1