user3927312
user3927312

Reputation: 824

How to get the Intersection and Union of two Series in Pandas with non-unique values?

If I have 2 Series objects, like so: [0,0,1] [1,0,0] How would I get the intersection and union of the two? They only contain booleans which means they are non-unique values.

I have a large Boolean matrix. I've minhashed it and now I'm trying to find the false positives and negatives which I think means that I have to get the Jaccard similarity for each original pair.

Upvotes: 1

Views: 3956

Answers (1)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

Since you say they are booleans use logical_and and logical_or of numpy or & and | on series i.e

y1 = pd.Series([1,0,1,0])
y2 = pd.Series([1,0,0,1])

# Numpy approach 
intersection = np.logical_and(y1.values, y2.values)
union = np.logical_or(y1.values, y2.values)
intersection.sum() / union.sum()
# 0.33333333333333331

# Pandas approach 
sum(y1 & y2) / sum(y1 | y2)
# 0.33333333333333331

Upvotes: 2

Related Questions