Jake
Jake

Reputation: 4374

Why use a set for list comparisons?

I just read another users question while looking for a way to compute the differences in two lists.

Python, compute list difference

My question is why would I do

def diff(a,b):
    b = set(b)
    return [aa for aa in a if aa not in b]

rather than doing

def diff(a,b):
    tmp = []
    for i in a:
        if(i not in b):
            tmp.append(i)
return tmp

edit: just noticed the second diff function actually returned the similarities. It should be correct now.

Upvotes: 7

Views: 426

Answers (5)

raulcumplido
raulcumplido

Reputation: 425

I've done some tests:

test_lists.py

a = range(1, 1000)
b = range(2, 1002)

tmp = []
for i in a:
    if(i not in b):
        tmp.append(i)

test_set_list_comprehensions.py

a = range(1, 1000)
b = range(2, 1002)

b = set(b)
[aa for aa in a if aa not in b]

test_set.py

a = range(1, 1000)
b = range(2, 1002)
list(set(a).difference(set(b)))

And that's what timeit says:

~$ python -m timeit 'import test_lists'
1000000 loops, best of 3: 0.671 usec per loop
~$ python -m timeit 'import test_set_list_comprehension'
1000000 loops, best of 3: 0.766 usec per loop
~$ python -m timeit 'import test_set'
1000000 loops, best of 3: 0.656 usec per loop

So the best one seems to be:

test_set.py

a = range(1, 1000)
b = range(2, 1002)
list(set(a).difference(set(b)))

Upvotes: 2

Levon
Levon

Reputation: 143072

List comprehension is generally regarded as more efficient than using regular list operations when creating a list. If memory is a concern, you may want to use a generator instead.

This provides a bit of information re performances of for-loops, map and list comprehension. @benhoyt also provided a useful link comparing loops vs list comprehension.

However, please note that if performance is a specific concern, it might be worth your while to time/benchmark your various options to select the best option for your specific requirements.

Upvotes: 2

tskuzzy
tskuzzy

Reputation: 36476

Just from an algorithmic perspective, it takes O(n) to construct the set and O(n) to do the list comprehension (since testing if an element is contained in a set is O(1)). However in the second example, it would take O(n^2) to loop through both lists. So regardless of the programming language, the first approach is superior.

Also, list comprehensions in python are inherently faster than a for loop. This reduces the constant factor even more (and significantly so too). The reason why can be summarized in this post which I quote here:

The fact that list comprehensions can only be comprised of expressions, not statements, is a considerable factor, as much less work is required behind the scenes for each iteration. Another factor is that the underlying iteration mechanism for list comprehensions is much closer to a C loop than execution of a for loop.

Upvotes: 8

NPE
NPE

Reputation: 500595

The main difference between the two options is that the one that uses set is asymptotically much more efficient.

Under reasonably favourable conditions, looking up an item in a set can be done in O(1) time; looking up an item in a list requires O(n) time.

The second, less significant, difference is that one version uses a list comprehension while the other uses a for loop. List comprehensions tend to produce more compact code. They also tend to be more efficient (although if performance is a concern, the only way to get an accurate picture is by running benchmarks).

Upvotes: 5

Gabe
Gabe

Reputation: 86738

Using the set is usually faster because it only has to iterate b once, while your example has to iterate b once for each element in a.

Upvotes: 2

Related Questions