Reputation: 4212
Let me have thse two lists:
a = ['a','b','c','a','a']
b = ['a','b','d']
I need to calculate Jaccard distance = (union-intersect)/union, but I know there gonna be duplicates in each list, and I want to count them, so intersect lenght for the example would be 2 and Jaccard distance = (8-2)/8
How can I do that? first thought is to joint lists and then remove elements one by one...
UPDATE: probably I had to stress more that I need to count dublicates;
here is my working solution, but it is quite ugly:
a = [1,2,3,1,1]
b = [2,1,1, 6,5]
import collections
aX = collections.Counter(a)
bX = collections.Counter(b)
r1 = [x for x in aX if x in bX]
print r1
print sum((min(aX[x], bX[x]) for x in r1))
>>> 3
Upvotes: 0
Views: 788
Reputation: 4861
To the get the jaccard index between two lists a and b:
def jaccard_distance(a,b):
a = set(a)
b = set(b)
c = a.intersection(b)
return float(len(a) + len(b) - len(c)) /(len(a) + len(b))
Upvotes: 1
Reputation: 4212
here is my working solution, but it is quite ugly:
a = [1,2,3,1,1]
b = [2,1,1, 6,5]
import collections
aX = collections.Counter(a)
bX = collections.Counter(b)
r1 = [x for x in aX if x in bX]
print r1
print sum((min(aX[x], bX[x]) for x in r1))
>>> 3
Upvotes: 0
Reputation: 55
a = ['a','b','c','a','a']
b = ['a','b','d']
c = list(set(b).intersection(a))
['a','b']
Note sets will discard duplicates!
Upvotes: 1