Pandas dataframe compare multiple rows with specific condition

Question

I hope I can explain my problem correctly.. I have a dataframe (306x40) with multiple rows containing data of certain group, I need to group them by index, that's no problem. Next I need to compare the rows with another row that has a specific condition. Take this data as an example:

Id  Condition   var1   var2   var3
1      1         0      1      0
1      3         1      1      0
2      2         0      0      1
2      3         0      0      1
2      1         0      0      1

So I need to compare var1, var2, var3 per Id with the row that has condition 3.. The outcome would let me know that any of the variables is different than the variables in condition 3. Possible outputs:

Id  Condition   var1   var2   var3  count_false
1      1        false  true   true      1
1      3        NaN    NaN    NaN       NaN
2      2        true   true   true      0
2      3        NaN    NaN    NaN       NaN
2      1        true   true   true      0

Or just simply saying that condition x is not the same as condition 3 for y Id

I hope you guys understand what I need, but I'm happy to elaborate

Thank you so much in advance!

edit to make it more clear:

The data consists of data from 3 different methods to detect bacteria in patients (clinical data), note that per patient it differs which method was used and how many times. So I have method 1, 2, 3, these are the different conditions. The variables are the different kinds of bacteria found. Method 1 and 2 are the golden standard and method 3 is the one that needs to be validated. So I want to see if method 3 gives the same result as method 1 or 2. the 30+ variables are the bacteria and value 1 indicates present bacteria and 0 not present.

user3774410 · Accepted Answer

#x!=x.loc[i] gives you the truth value of each row when comparing with your desired(condition) row. Rest of the code is below:

df=pd.DataFrame([[1,1,0,1,0],[1,3,1,1,0],[2,2, 0,0,1],[2,3, 0,0,1],[2,1, 0,0,1]],columns=['ID','Condition','var1','var2','var3'])
def comp(x):
    i=x.index[x['Condition']==3][0]
    #print(x!=x.loc[i])
    truth=(x[['var1','var2','var3']]==x.loc[i,['var1','var2','var3']])
    x=pd.concat([x[['ID','Condition']],truth],axis=1)
    return x
grp=df.groupby('ID').apply(comp)
grp['count_false']=grp[['var1','var2','var3']].apply(lambda x: 3-x.sum(),axis=1)
print(grp)

Pandas dataframe compare multiple rows with specific condition

Answers (2)

Related Questions