Reidel
Reidel

Reputation: 377

What does keyword "in" really do when is used to test if a sequence (list, tuple, string etc.) contains a value. A loop until find the value?

given the following code:

numbers= [1,2,3,1,2,4,5]
if 5 in numbers:
    .................

we can notice we have a list(numbers) with 7 items and I want to know if the keyword in behind the scene do a loop for find a match in the list

Upvotes: 1

Views: 139

Answers (2)

abarnert
abarnert

Reputation: 365717

For container types in general, this is documented under __contains__ in the data model chapter and Membership test operations in the expressions chapter.

When you write this:

x in s

… what Python does is effectively (slightly oversimplified):

try:
    return s.__contains__(x)
except AttributeError:
    for value in s:
        if value == x:
            return True
    return False

So, any type that defines a __contains__ method can do whatever it wants; any type that doesn't, Python automatically loops over it as an iterable. (Which in turn calls s.__iter__ if present, the old-style sequence API with s.__getitem__ if not.)


For the builtin sequence types list and tuple, the behavior is defined under Sequence Types — list, tuple, range and (again) Membership test operations:

True if an item of s is equal to x, else False

… which is exactly the same as the fallback behavior.

Semantically:

For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

Fpr list and tuple, this is in fact implemented by looping over all of the elements, and it's hard to imagine how it could be implemented otherwise. (In CPython, it's slightly faster because it can do the loop directly over the underlying array, instead of using an iterator, but it's still linear time.)

However, some other builtin types do something smarter. For example:

  • range does the closed-form arithmetic to check whether the value is in the range in constant time.
  • set, frozenset, and dict use the underlying hash table to look up the value in constant time.

There are some third-party types, like sorted-dict collections based on trees or skiplists or similar, that can't search in constant time but can search in logarithmic time by walking the tree. So, they'll (hopefully) implement __contains__ in logarithmic time.


Also note that if you use the ABC/mixin helpers in collections.abc to define your own Sequence or Mapping type, you get a __contains__ implementation for free. For sequences, this works by iterating over all the elements; for mappings, it works by try: self.[key].

Upvotes: 5

nosklo
nosklo

Reputation: 222862

It depends. It will call __contains__() on the container class (right side) - that can be implemented as a loop, and for some classes it can be calculated by some other faster method if possible.

You can even define it on your own class, like this ilustrative example:

class ContainsEverything:
    def __contains__(self, item):
        return True

c = ContainsEverything()

>>> None in c
True
>>> 4 in c
True

Upvotes: 8

Related Questions