Reputation: 2163
What is the fastest way to check if a list is inside of a nested list, full iteration or using in
?
Given
A = [['Yes','2009','Me'],['Yes','2009','You'],['No','2009','You']]
B = [['No','2009','Me'],['Yes','2009','You'],['No','2009','You']]
Count number of duplicates between A and B.
I see either iterating over all elements:
for i in range(len(A)):
for j in range(len(B)):
if A[i] == B[j]:
count+=1
Or using in with one element iteration:
for i in range(len(A)):
if A[i] in B:
count+=1
With the actual lengths of A and B being over 100,000 arrays, and each contains 4 elements, are there any specific functions or strategies to do this comparison efficiently?
With my data, option 1 is green, option 2 is blue, the answer from qqvc is red, user1245262 answer is turquoise (it is at the bottom with very fast, linear complexity) y axis is seconds, x axis is number of 4 element arrays being compared in each list.
Upvotes: 0
Views: 94
Reputation: 23773
option z:
sum(thing in B for thing in A)
option y:
sum(itertools.starmap(operator.eq, itertools.product(A,B)))
Upvotes: 0
Reputation: 7505
You might try using sets. Consider:
>>> A = [['Yes','2009','Me'],['Yes','2009','You'],['No','2009','You']]
>>> B = [['No','2009','Me'],['Yes','2009','You'],['No','2009','You']]
sets require hashable elements, so you need to convert the lists to tuples. I'm assuming that your lists are all in some particular order, so that ['dog',2,'mouse'] will always appear that way, and not as ['mouse', 2, 'dog']. Then,
>>> AA = set(map(tuple,A))
>>> BB = set(map(tuple,B))
Then,
>>> BB.intersection(AA)
set([('No', '2009', 'You'), ('Yes', '2009', 'You')])
Since you only seem to want the size of the intersection,
>>> len(BB.intersection(AA))
2
This might be faster than your looping, but you'd have to check it.
Upvotes: 1