Reputation: 255
I have a table:
x y z
A 2 0 3
B 0 3 0
C 0 0 4
D 1 4 0
I want to calculate the Jaccard similarity in Matlab, between the vectors A, B, C and D. The formula is :
In this formula |x| and |y| indicates the number of items which are not zero. For example |A| number of items that is not zero is 2, for |B| and |C| it is 1, and for |D| it is 2.
|x intersect y| indicates the number of common items which are not zero. |A intersect B| is 0. |A intersect D| is 1, because the value of x in both is not zero.
e.g.: jaccard(A,D)= 1/3=0.33
How can I implement this in Matlab?
Upvotes: 5
Views: 12909
Reputation: 7751
Matlab has a built-in function that computes the Jaccard distance: pdist
.
Here is some code
X = rand(2,100);
X(X>0.5) = 1;
X(X<=0.5) = 0;
JD = pdist(X,'jaccard') % jaccard distance
JI = 1 - JD; % jaccard index
EDIT
A calculation that does not require the statistic toolbox
a = X(1,:);
b = X(2,:);
JD = 1 - sum(a & b)/sum(a | b)
Upvotes: 6