Compare or diff two pandas columns element wise

Question

I am new to Pandas (but not to data science and Python). This question is not anly about how to solve this specific problem but how to handle problems like this the panda-way.

Please feel free to improve the title of that question. Because I am not sure what are the correct terms here.

Here is my MWE

#!/usr/bin/env python3

import pandas as pd

data = {'A': [1, 2, 3, 3, 1, 4],
        'B': ['One', 'Two', 'Three', 'Three', 'Eins', 'Four']}

df = pd.DataFrame(data)

print(df)

Resulting in

   A      B
0  1    One
1  2    Two
2  3  Three
3  3  Three
4  1   Eins
5  4   Four

My assumption is that when the value in A column is 1 that the value in B column is always One. And so on...

I want to proof that assumption.

Secondary I also assume that if my first assumption is incorrect that this is not an error but there are valid (human) reasons for that. e.g. see row index 4 where the A-value is related to Eins (and not One) in the B column.

Because of that I also need to see and explore the cases where my assumption is incorrect.

Update of the question: This data is only an example. In real world I am not aware of the pairing of the two columns. Because of that solutions like this do not work in my case

df.loc[df['A'] == 1, 'B']

I do not know how many and which expressions are in column A.

I do not know how to do that with pandas. How would a panda professional would solve this?

My approach would be to use pure Python code with list(), set() and some iterations. ;)

jezrael · Accepted Answer

If solution should be testing if only one unique value per A and return all rows which failed use DataFrameGroupBy.nunique for count unique values in GroupBy.transform for repeat aggregate values per groups, so possible filter rows which are not 1, it means there are 2 or more unique values per A:

df1 = df[df.groupby('A').B.transform('nunique').ne(1)]
print (df1)
   A     B
0  1   One
4  1  Eins

if df1.empty:
    print ('My assumption is good')
else:
    print ('My assumption is wrong')
    print (df1)

Compare or diff two pandas columns element wise

Answers (2)

Related Questions