anekix
anekix

Reputation: 2563

python set vs tuple lookup. is Lookup in Tuple O(1)?

recently i saw a question here on python here

1: Python If statement and logical operator issue. There was someone in comments who gave an answer that it can be done like this:

1 in (1, 2, 3) to check if 1 is in present in a collection of items.but according to me this should be much faster 1 in {1, 2, 3}. as you can see there in discussions ,someone of high reputation goes on saying that ( ) is faster for fixed size input. and has faster lookup than { }. i am asking it here because i want to know for my own understanding which one is correct and also i do not get the idea of ( ) being fiexd-size or variable size. i just asked for a refrence in that original question so i can correct myself if i am wrong but the user insits upon clearing my basics of computer science knowledge without givin g a single refrence to his argument that lookup in Tuple is O(1).so i am asking it here.

Upvotes: 2

Views: 3627

Answers (3)

Tom Ekberg
Tom Ekberg

Reputation: 2261

The case for set has been improved in python 3.5 (perhaps earlier). Here is my test function:

def is_valid(x):
  return x in ('name', 'known_as')

The code generated is:

  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               3 (('name', 'known_as'))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE

Changing the tuple to a set generates this code:

  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               3 (frozenset({'name', 'known_as'}))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE

Python 3.7 and 3.8 generate the same code for both cases. The timing results are:

$ python3.5 -m timeit -s "def is_valid(x): return x in {'name', 'known_as'}" "is_valid('')"
10000000 loops, best of 3: 0.0815 usec per loop
$ python3.5 -m timeit -s "def is_valid(x): return x in ('name', 'known_as')" "is_valid('')"
10000000 loops, best of 3: 0.0997 usec per loop

Passing 'name' to is_valid for the tuple case runs in 0.0921 usec per loop, still slower than set.

Upvotes: 4

user4815162342
user4815162342

Reputation: 155036

Although the other commenter is technically correct that x in <constant expression> is O(1) at run-time, it is still interesting to compare performance of sets and tuples of the same size. The discussion can be generalized it to consider different programs containing expressions of the form x in {a, b, c, ...}. If the expression consists of n items, then n can be considered input for big-O analysis, among all possible such programs. (And if still one insisted that it be possible to provide a different n at run-time, simply imagine that the function is created using exec.)

The issue with performance of such expressions is that the Python run-time must construct a one-time set for use in the in test, and then immediately discard it. This is clearly visible in the generated assembly:

>>> import dis
>>> def is_valid(x):
...     return x in {1, 2, 3}
... 
>>> dis.dis(is_valid)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (1)
              6 LOAD_CONST               2 (2)
              9 LOAD_CONST               3 (3)
             12 BUILD_SET                3
             15 COMPARE_OP               6 (in)
             18 RETURN_VALUE        

Constructing a set of n elements obviously has a cost of at least O(n). In other words, the test using a set, implemented as a literal constant, is not O(1) because the interpreter has to construct the set. This is what the commenter was trying to get across by referring to the cost of construction.

In fact, it gets stranger than that; due to nature of the Python VM, the compiler is allowed to construct tuples consisting only of numeric literals at compile time, which it does:

>>> import dis
>>> def is_valid(x):
...     return x in (1, 2, 3)
... 
>>> dis.dis(is_valid)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               4 ((1, 2, 3))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE        

Note how the (1, 2, 3) constant doesn't need to be built item by item - that is because it was already built by the compiler and inserted into the function's environment. As a result, this implementation of is_valid might actually be faster than the one using a set! This is easy to test:

$ python -m timeit -s 'def is_valid(x): return x in {1, 2, 3}' 'is_valid(-1)'
10000000 loops, best of 3: 0.189 usec per loop
$ python -m timeit -s 'def is_valid(x): return x in (1, 2, 3)' 'is_valid(-1)'
10000000 loops, best of 3: 0.128 usec per loop

Again, the other commenter was right.

Increasing the size of the set/tuple doesn't tip the balance in favor of set - it is always more expensive to construct a set of n items and then perform a blindingly-fast constant-time search, than to just iterate over a pre-created tuple looking for an item. This is because set creation has to allocate the set (possibly several times) and calculate hashes of all the items. Although both tuple search and set size are O(n), the set's one has a larger constant factor.

The correct way to implement an O(1) lookup would entail manually implementing the optimization that the compiler does automatically for tuples:

_valid = {1, 2, 3}
def is_valid(x):
    return x in _valid

Comparing this code with the equivalent code using a tuple, the set is always faster, even with a small number of items. As the number of items grows, set becomes the clear winner with its O(1) lookup.

Upvotes: 3

user2357112
user2357112

Reputation: 280963

When you say something like O(n), you have to say what n is. Here, n is the length of the tuple... but the tuple is not an input. You're not taking the tuple as an argument or anything. n is always 2 in the conversation you linked, or 3 for your example tuple, so for this particular n, O(n) is the same thing as O(2), or O(1).

As you may have noticed by now, it doesn't make much sense to talk about O(n) when n is a constant. If you had a function like

def in_(element, tup):
    return element in tup

you could say that the runtime is O(n) element comparisons, where n is len(tup), but for something like

usr in ('Y', 'y')

talking about n isn't very useful.

Upvotes: 6

Related Questions