Reputation: 515
I need results to have a 1:1 cardinality so I need to test if a value in COL1 exists more than once in COL2
COL1 COL2
A 1
B 2
B 2
B 3
C 4
D 5
E 5
E 5
Using Python (preferrably Pandas unless a better way exists), I want to see all rows where a value in COL1 has more than one match in COL2? In the example above, I want to know when COL1=B has more than 1 match in COL2 (i.e the cardinality in COL1 = B matches/joins with COL2 = 2 & also 3?
Upvotes: 1
Views: 67
Reputation: 402473
If you just want the rows that violate this condition, use groupby
and check with nunique
:
df[df.groupby('COL1').COL2.transform('nunique') > 1]
Or, with groupby
, nunique
, and map
:
df[df.COL1.map(df.groupby('COL1').COL2.nunique()) > 1]
COL1 COL2
1 B 2
2 B 2
3 B 3
If you want a mapping of COL1 value to COL2 values, you can use an additional groupby
and apply
:
df[df.groupby('COL1').COL2.transform('nunique') > 1].groupby('COL1').COL2.apply(set)
COL1
B {2, 3}
Name: COL2, dtype: object
And finally, if all you want is the "cardinality" for > 1 COL1 values, use
df.groupby('COL1').COL2.nunique().to_frame().query('COL2 > 1')
COL2
COL1
B 2
Upvotes: 1