qouify
qouify

Reputation: 3900

Set inclusion faster with set literals

In the following I time 10_000_000 checks of whether if 10 is in {0, ..., 9}.

In the first check I use an intermediate variable and in the second one I use a literal.

import timeit

x = 10
s = set(range(x))
number = 10 ** 7

stmt = f'my_set = {s} ; {x} in my_set'
print(f'eval "{stmt}"')
print(timeit.timeit(stmt=stmt, number=number))

stmt = f'{x} in {s}'
print(f'eval "{stmt}"')
print(timeit.timeit(stmt=stmt, number=number))

Output:

eval "my_set = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} ; 10 in my_set"
1.2576093
eval "10 in {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}"
0.20336140000000036

How is it that the second one is way faster (by a factor of 5-6 approximately)? Is there some runtime optimisation performed by Python, e.g., if the inclusion check if made on a literal? Or maybe is it due to garbage collection (since it is a literal python garbage collects it right after use)?

Upvotes: 3

Views: 141

Answers (1)

ddejohn
ddejohn

Reputation: 8952

You aren't testing the same two things -- in the first test, you're timing two assignments and lookups in addition to the membership test:

In [1]: import dis
   ...: x = 10
   ...: s = set(range(x))

In [2]: dis.dis("x in s")
  1        0 LOAD_NAME                0 (x)
           2 LOAD_NAME                1 (s)
           4 CONTAINS_OP              0
           6 RETURN_VALUE

In [3]: dis.dis("my_set = s; x in my_set")
  1        0 LOAD_NAME                0 (s)
           2 STORE_NAME               1 (my_set)
           4 LOAD_NAME                2 (x)
           6 LOAD_NAME                1 (my_set)
           8 CONTAINS_OP              0
          10 POP_TOP
          12 LOAD_CONST               0 (None)
          14 RETURN_VALUE

# By request
In [4]: dis.dis("s = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; 10 in s")
  1        0 BUILD_SET                0
           2 LOAD_CONST               0 (frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9}))
           4 SET_UPDATE               1
           6 STORE_NAME               0 (s)
           8 LOAD_CONST               1 (10)
          10 LOAD_NAME                0 (s)
          12 CONTAINS_OP              0
          14 POP_TOP
          16 LOAD_CONST               2 (None)
          18 RETURN_VALUE

The actual difference between using literals and x in s is that the latter needs to go perform a lookup in globals, i.e., the difference is LOAD_NAME vs LOAD_CONST:

In [5]: dis.dis("10 in {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}")
  1        0 LOAD_CONST               0 (10)
           2 LOAD_CONST               1 (frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9}))
           4 CONTAINS_OP              0
           6 RETURN_VALUE

Times:

In [6]: %timeit x in s
28.5 ns ± 0.792 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [7]: %timeit 10 in {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
20.3 ns ± 0.384 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Upvotes: 6

Related Questions