Aqeel Abbas
Aqeel Abbas

Reputation: 169

Implementing jaccard similarity in c#

I am trying to understand "Jaccard similarity" between 2 arrays of type double having values greater than zero and less than one.

Till now i have searched many websites for this but what I found is that the both arrays should be of same size(Number of elements in array 1 should be equal to number of elements in array 2). But I am having different number of elements in both arrays. Is there any way to implement "jaccard similarity" ?

Upvotes: 2

Views: 4450

Answers (3)

ilia
ilia

Reputation: 640

Sorry for necroposting, but the answer above was marked as the correct one. Jaccard similarity coefficient from @AgapwIesu answer can be maximum 0.5 if collections are fully identical. At least, you need to multiply numerator x2 to normalize it, like this:

var CommonNumbers = from a in A.AsEnumerable<double>()
                    join b in B.AsEnumerable<double>() on a equals b
                    select a;
double JaccardIndex = 2*(((double) CommonNumbers.Count()) /
                       ((double) (A.Count() + B.Count())));

Please note, that this similarity coefficient is not intersection, devided by union as defined at Wikipedia. If you want to get intersection, devided by union using LINQ, you can try this code:

private static double JaccardIndex(IEnumerable<double> A, IEnumerable<double> B)
{
    return (double)A.Intersect(B).Count() / (double)A.Union(B).Count();
}

Take into account, that Union and Intersect works with unique objects, so you should be careful working with non-unique collections:

List<int> A = new List<int>() { 1, 1, 1, 1 };
List<int> B = new List<int>() { 1, 1, 1, 1 };
Console.WriteLine(A.Union(B).Count()); // = 1, not 4
Console.WriteLine(A.Intersect(B).Count()); // = 1, not 4

Upvotes: 3

user4843530
user4843530

Reputation:

Using C#'s LINQ ...

Say you have an array of doubles named A and another named B. This will give you the Jaccard index:

var CommonNumbers = from a in A.AsEnumerable<double>()
                    join b in B.AsEnumerable<double>() on a equals b
                    select a;
double JaccardIndex = (((double) CommonNumbers.Count()) /
                       ((double) (A.Count() + B.Count())));

The first statement gets a list of numbers that appear in both arrays. The second computes the index - that is just the size of the intersection (how many numbers appear in both arrays) divided by the size of the union (size, or rather count, of the one array plus the count of the other).

Upvotes: 4

user4843530
user4843530

Reputation:

Jaccard similarity is an index of the size of intersection between two sets, divided by the size of the union. In your case, you'd have to write the code to find out how many elements appear in both arrays, then divide that by the sum of the size of both arrays.

Upvotes: 2

Related Questions