Reputation: 325
I have this problem in calculating Jaccard Similarity for Sets (Bit-Vectors):
v1 = 10111
v2 = 10011
Size of intersection = 3; (How could we find it out?)
Size of union = 4, (How could we find it out?)
Jaccard similarity = (intersection/union) = 3/4
But I don't understand how could we find out the "intersection" and "union" of the two vectors.
Please help me.
Upvotes: 1
Views: 2174
Reputation: 1500873
Presumably your definitions of "intersection" and "union" are "number of bits set in both values" and "number of bits set in either value".... which is (assuming you're using something like int
or long
for the vectors):
int intersection = CountBits(v1 & v2);
int union = CountBits(v1 | v2);
Next you just need to implement CountBits
. This Stack Overflow question can help you there.
Instead of using int
or long
, you may want to use BitArray
. That has And
and Or
methods, which look like they don't mutate the original values, but it's not entirely clear. You'd need to work out the best way of counting the bits set in a BitArray
of course - just array.Cast<bool>().Count(bit => bit)
may well work.
Upvotes: 4