qiaop
qiaop

Reputation: 169

Which one is faster: iterating over a set and iterating over a list

Say that I have a list of strings and a set of the same strings:

l = [str1, str2, str3, str4, ...]
s = set([str1, str2, st3, str4, ...])

I need to run a string comparison with a phrase that I have: comparephrase

I need to iterate over all the elements in the list or the set and generate a ratio between the comparephrase and the compared string. I know that set() is faster when we are doing a membership test. However, I am not doing a membership test but comparing the phrase that I have and the strings in the list/set. Does set() still offer a faster speed? If so, why? It seems to me that this set is actually a set with a list inside. Wouldn't that take the long time since we're iterating over the list within the set?

Upvotes: 10

Views: 9942

Answers (4)

sausix
sausix

Reputation: 289

The test in the accepted answer is not really representative as Aditya Shaw stated.

Let me explain the technical differences between iterating lists and sets roughly in an easy way.


Iterating lists
Lists have their elements organized by an order and an index "by design", which can be iterated easily. Access by an index is fast because it bases on few cheap memory read operations.

Iterating sets
Sets iterate slower because their element access is done by hashes.
Imagine a big tree with many branches and each leaf is an element. A hash is being translated into "directions" to traverse all branches until reaching the leaf.
Finding a leaf or element is still fast but slower compared to a simple index access of a list.
Sets have no linked items so an iteration cannot jump to the "next" item easily like a list. It has to start from the root of the tree on each iteration.


Sets (and dicts) are contrary to lists.
Each kind has a primary use case. Lists and sets swap their advantages on finding elements directly.

Does a list contain an element?
The list has to iterate over all its elements until it finds a match. That depends heavily on the list size. If the element is near the beginning, it gets found quite fast even on large lists. If an element is not in the list or near the end, the list gets iterated completely or until a match near the end.

Does a set contain an element?
It just has to traverse let's say 5 branches to look if there's a leaf. Even on large sets, the number of branches to traverse is relative low.

Why not make and use a universal collection type?
If a set had an internal list index, the set's operations would be slower because the list had to be updated and/or checked.

If a list had an internal set to find items faster, list operations would be slower because the hashes had to be updated and/or checked.

Data found behind the hash and behind the index also may be inconsistent because of duplicate management. And that also increases memory usage at all.

Upvotes: 5

Aditya
Aditya

Reputation: 363

Iterating over a List is much much faster than iterating over a set.

The currently accepted answer is using a very small set and list and hence, the difference is negligible there.

The following code explains it:

>>> import timeit
>>> l = [ x*x for x in range(1, 400)] 
>>> s = set(l)
>>> timeit.timeit("for i in s: pass", "from __main__ import s")
12.152284085999781
>>> timeit.timeit("for i in l: pass", "from __main__ import l")
5.460189446001095
>>> timeit.timeit("if 567 in l: pass", "from __main__ import l")
6.0497558240003855
>>> timeit.timeit("if 567 in s: pass", "from __main__ import s")
0.04609546199935721

I do not know what makes set iteration slow, but the fact is evident from the output above.

Upvotes: 13

hlt
hlt

Reputation: 6317

I've run some tests with timeit, and (while list performs slightly faster) there is no significant difference:

>>> import timeit
>>> # For the set
>>> timeit.timeit("for i in s: pass", "s = set([1,4,7,10,13])")
0.20565616500061878
>>> # For the list
>>> timeit.timeit("for i in l: pass", "l = [1,4,7,10,13]")
0.19532391999928223

These values stay very much the same (0.20 vs. 0.19) even when trying multiple times.

However, the overhead of creating the sets can be significant.

Upvotes: 1

TheSoundDefense
TheSoundDefense

Reputation: 6935

A Python set is optimized for equality tests and duplicate removal, and thus implements a hash table underneath. I believe this would make it very slightly slower than a list, if you have to compare every element to comparephrase; lists are very good for iterating over every element one after the other. The difference is probably going to be negligible in almost any case, though.

Upvotes: 4

Related Questions