Ch3steR
Ch3steR

Reputation: 20689

Why is a set object stored as a frozenset and a list object as a tuple?

I saw a blog post where it's mentioned "Use func.__code__.co_consts to check all the constants defined in the function".

def func():
    return 1 in {1,2,3}
func.__code__.co_consts
(None, 1, frozenset({1, 2, 3}))

Why did it return a frozenset?

def func():
    return 1 in [1,2,3]
func.__code__.co_consts
(None, 1, (1,2,3))

Why did it return a tuple instead of a list? Every object returned from __code__.co_consts is immutable. Why are the mutable constants made immutable? Why is the first element of the returned tuple always None?

Upvotes: 3

Views: 294

Answers (2)

Omer Tuchfeld
Omer Tuchfeld

Reputation: 3052

Historically, this has been a result of Python doing Peephole optimization.

See this old (now deleted) article from the documentation of some bytecode crate.

Under "Optimizations" it lists optimizations performed by the peephole optimizer, among other optimizations it mentions this one:

BUILD_LIST + COMPARE_OP(in/not in): convert list to tuple BUILD_SET + COMPARE_OP(in/not in): convert set to frozenset

But in more recent versions of CPython this optimization has been moved so that it is performed at the AST level, rather than the bytecode level. The resulting bytecode however should be the same.

As for the reason of why exactly, it seems to be performance-related, but I couldn't find a detailed explanation for why a frozenset would perform better in this context.

Upvotes: 4

Sven Marnach
Sven Marnach

Reputation: 602745

All objects in co_consts are constants, i.e. they are immutable. You shouldn't be able to, e.g., append to a list appearing as a literal in the source code and thereby modify the behaviour of the function.

The compiler usually represents list literals by listing all individual constants appearing in the list:

>>> def f():
...     a = [1, 2, 3]
...     return 1 in a
... 
>>> f.__code__.co_consts
(None, 1, 2, 3)

Looking at the byte code of this function we can see that the function builds a list at execution time each time the function is executed:

>>> dis.dis(f)
  2           0 LOAD_CONST               1 (1)
              2 LOAD_CONST               2 (2)
              4 LOAD_CONST               3 (3)
              6 BUILD_LIST               3
              8 STORE_FAST               0 (a)

  3          10 LOAD_CONST               1 (1)
             12 LOAD_FAST                0 (a)
             14 COMPARE_OP               6 (in)
             16 RETURN_VALUE

Creating a new list is required in general, because the function may modify or return the list defined by the literal, in which case it needs to operate on a new list object every time the funciton is executed.

In other contexts, creating a new list object is wasteful, though. For this reason, Python's peephole optimizer can replace the list with a tuple, or a set with a frozen_set, in certain situations where it is known to be safe. One such situation is when the list or set literal is only used in an expression of the form x [not] in <list_literal>. Another such situation is when a list literal is used in a for loop.

The peephole optimizer is very simple. It only looks at one expression at a time. For this reason, it can't detect that this optimization would be safe in my definition of f above, which is functionally equivalent to your example.

Upvotes: 1

Related Questions