AML
AML

Reputation: 325

how to implement Jaccard Similarity in C#

I have this problem in calculating Jaccard Similarity for Sets (Bit-Vectors):

v1 = 10111

v2 = 10011

Size of intersection = 3; (How could we find it out?)

Size of union = 4, (How could we find it out?)

Jaccard similarity = (intersection/union) = 3/4

But I don't understand how could we find out the "intersection" and "union" of the two vectors.

Please help me.

Upvotes: 1

Views: 2174

Answers (1)

Jon Skeet
Jon Skeet

Reputation: 1500873

Presumably your definitions of "intersection" and "union" are "number of bits set in both values" and "number of bits set in either value".... which is (assuming you're using something like int or long for the vectors):

int intersection = CountBits(v1 & v2);
int union = CountBits(v1 | v2);

Next you just need to implement CountBits. This Stack Overflow question can help you there.

Instead of using int or long, you may want to use BitArray. That has And and Or methods, which look like they don't mutate the original values, but it's not entirely clear. You'd need to work out the best way of counting the bits set in a BitArray of course - just array.Cast<bool>().Count(bit => bit) may well work.

Upvotes: 4

Related Questions