Validate rows from a Pandas dataframe are equal between values in columns

Question

Say I have a pandas DataFrame as below:

   C30    C25    C20    C15    C10
1  AJA    EJE    IJI    OJO    UJU
2  AJA    EJE    IJI    OJO    UJU
3  AJA    EJE    IJI    OJO    UJU
4  ABA    EBE    IBI    OBO    UBU
5  ABA    EBE    IBI    OBO    UBU
6  ABA    EBE    IBI    OBO    UNU
7  BOB    BIB    BEB    BAB    BUB

I would like to validate all the rows that have equal values in C30 have also equal values in C25, C20, C15 & C10.

I have approximately 60,000 rows with a lot of variation in the column values, if elements in column C30 are equal then the corresponding column C25 elements should also be equal and so on.

For the example, validation should shows a mismatch because C10 value in row 6 is not equal to 4 & 5 rows. What's the most efficient way to do this?

jezrael · Accepted Answer

You can use DataFrameGroupBy.nunique and compare for not equal 1 by DataFrame.ne:

m = df.groupby('C30').nunique().ne(1)
print (m)
       C25    C20    C15    C10
C30                            
ABA  False  False  False   True
AJA  False  False  False  False
BOB  False  False  False  False

And then if need values if no matched use np.where:

i, c  = np.where(m)

L = list(zip(m.index[i], m.columns[c]))

if len(L) > 0:
    print ('mismatch')
    print (L)
mismatch
[('ABA', 'C10')]

Validate rows from a Pandas dataframe are equal between values in columns

Answers (2)

Related Questions