Reputation: 4374
I just read another users question while looking for a way to compute the differences in two lists.
Python, compute list difference
My question is why would I do
def diff(a,b):
b = set(b)
return [aa for aa in a if aa not in b]
rather than doing
def diff(a,b):
tmp = []
for i in a:
if(i not in b):
tmp.append(i)
return tmp
edit: just noticed the second diff function actually returned the similarities. It should be correct now.
Upvotes: 7
Views: 426
Reputation: 425
I've done some tests:
test_lists.py
a = range(1, 1000)
b = range(2, 1002)
tmp = []
for i in a:
if(i not in b):
tmp.append(i)
test_set_list_comprehensions.py
a = range(1, 1000)
b = range(2, 1002)
b = set(b)
[aa for aa in a if aa not in b]
test_set.py
a = range(1, 1000)
b = range(2, 1002)
list(set(a).difference(set(b)))
And that's what timeit says:
~$ python -m timeit 'import test_lists'
1000000 loops, best of 3: 0.671 usec per loop
~$ python -m timeit 'import test_set_list_comprehension'
1000000 loops, best of 3: 0.766 usec per loop
~$ python -m timeit 'import test_set'
1000000 loops, best of 3: 0.656 usec per loop
So the best one seems to be:
test_set.py
a = range(1, 1000)
b = range(2, 1002)
list(set(a).difference(set(b)))
Upvotes: 2
Reputation: 143072
List comprehension is generally regarded as more efficient than using regular list operations when creating a list. If memory is a concern, you may want to use a generator instead.
This provides a bit of information re performances of for
-loops, map
and list comprehension
. @benhoyt also provided a useful link comparing loops vs list comprehension.
However, please note that if performance is a specific concern, it might be worth your while to time/benchmark your various options to select the best option for your specific requirements.
Upvotes: 2
Reputation: 36476
Just from an algorithmic perspective, it takes O(n)
to construct the set and O(n)
to do the list comprehension (since testing if an element is contained in a set is O(1)
). However in the second example, it would take O(n^2)
to loop through both lists. So regardless of the programming language, the first approach is superior.
Also, list comprehensions in python are inherently faster than a for loop. This reduces the constant factor even more (and significantly so too). The reason why can be summarized in this post which I quote here:
The fact that list comprehensions can only be comprised of expressions, not statements, is a considerable factor, as much less work is required behind the scenes for each iteration. Another factor is that the underlying iteration mechanism for list comprehensions is much closer to a C loop than execution of a for loop.
Upvotes: 8
Reputation: 500595
The main difference between the two options is that the one that uses set
is asymptotically much more efficient.
Under reasonably favourable conditions, looking up an item in a set can be done in O(1)
time; looking up an item in a list requires O(n)
time.
The second, less significant, difference is that one version uses a list comprehension while the other uses a for
loop. List comprehensions tend to produce more compact code. They also tend to be more efficient (although if performance is a concern, the only way to get an accurate picture is by running benchmarks).
Upvotes: 5
Reputation: 86738
Using the set is usually faster because it only has to iterate b
once, while your example has to iterate b
once for each element in a
.
Upvotes: 2