Chris
Chris

Reputation: 515

Find where column matches more than one in another column

I need results to have a 1:1 cardinality so I need to test if a value in COL1 exists more than once in COL2

COL1    COL2
A   1
B   2
B   2
B   3
C   4
D   5
E   5
E   5

Using Python (preferrably Pandas unless a better way exists), I want to see all rows where a value in COL1 has more than one match in COL2? In the example above, I want to know when COL1=B has more than 1 match in COL2 (i.e the cardinality in COL1 = B matches/joins with COL2 = 2 & also 3?

Upvotes: 1

Views: 67

Answers (1)

cs95
cs95

Reputation: 402473

If you just want the rows that violate this condition, use groupby and check with nunique:

df[df.groupby('COL1').COL2.transform('nunique') > 1]

Or, with groupby, nunique, and map:

df[df.COL1.map(df.groupby('COL1').COL2.nunique()) > 1]

  COL1  COL2
1    B     2
2    B     2
3    B     3

If you want a mapping of COL1 value to COL2 values, you can use an additional groupby and apply:

df[df.groupby('COL1').COL2.transform('nunique') > 1].groupby('COL1').COL2.apply(set)

COL1
B    {2, 3}
Name: COL2, dtype: object

And finally, if all you want is the "cardinality" for > 1 COL1 values, use

df.groupby('COL1').COL2.nunique().to_frame().query('COL2 > 1')

      COL2
COL1      
B        2

Upvotes: 1

Related Questions