Samuel Tan
Samuel Tan

Reputation: 1750

Which is faster? Checking if something is in a Python list or not? I.e. membership vs non-membership

this might be a noob question or blindingly obvious to those who understand more computer science than I do. Perhaps that is why I could not find anything from Google or SO after some searching. Maybe I'm not using the right vocabulary.

The title says it all. If I know that x is in my_list most of the time, which of the following is faster?

if x in my_list:
    func1(x)
else:
    func2(x)

Or

if x not in my_list:
    func2(x)
else:
    func1(x)

Does the size of the list matter? E.g. ten elements vs 10,000 elements? For my particular case my_list consists of strings and integers, but does anyone have any idea if other considerations apply to more complicated types such as dicts?

Thank you.

Upvotes: 2

Views: 1692

Answers (4)

verisimilidude
verisimilidude

Reputation: 475

Python includes a module and function timeit that can tell you how long a snippet of code takes to execute. The snippet must be a single statement, which leaves out directly timing a compound statement like an if but we can wrap your statements in a function and time the function call.

Even easier than calling timeit.timeit() is using a jupyter notebook and using the magic %timeit magic statement at the beginning of a line.

This proves that long list or short, succeeding or failing, the two ways you ask about, checking in alist or not in alist, give timings that are the same within the variability of measurement.

import random

# set a seed so results will be repeatable
random.seed(456789)

# a 10K long list of junk with no value greater than 100
my_list = [random.randint(-100, 100) for i in range(10000)] 

def func1(x):
    # included just so we get a function call
    return True

def func2(x):
    # included just so we get a function call
    return False

def way1(x):
    if x in my_list:
        result = func1(x)
    else:
        result = func2(x)
    return result

def way2(x):
    if x not in my_list:
        result = func2(x)
    else:
        result = func1(x)
    return result

%timeit way1(101) # failure with large list

The slowest run took 8.29 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 207 µs per loop

%timeit way1(0) # success with large list

The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.04 µs per loop

my_list.index(0)

186

%timeit way2(101) # failure with large list

The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 208 µs per loop

%timeit way2(0) # success with large list

The slowest run took 7.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.01 µs per loop

my_list = my_list[:10] # now make it a short list
print(my_list[-1]) # what is the last value

-37

# Run the same stuff again against the smaller list, showing that it is
# much faster but still way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list

The slowest run took 18.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 417 ns per loop
The slowest run took 13.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 403 ns per loop
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 427 ns per loop
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 386 ns per loop

# run the same again to get an idea of variability between runs so we can
# be sure that way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list

The slowest run took 8.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 406 ns per loop
The slowest run took 4.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 398 ns per loop

Upvotes: 1

Odo Frodo
Odo Frodo

Reputation: 26

One desired characteristic in software implementations is to have low coupling. Your implementation should not be defined by the way your Python interpreter tests for list membership, as that is a high level of coupling. It could be that the implementation changes and it is no longer the faster way.

All that we should care about in this case is that testing for membership in a list is linear on the size of the list. If faster membership testing is desired you could use a set.

Upvotes: 0

Gajo Petrovic
Gajo Petrovic

Reputation: 351

There should be no noticeable performance difference. You are better off writing whichever one makes your code more readable. Either one will be O(n) complexity, and will mostly depend where the element is located in the list. Also you should avoid optimizing prematurely, it doesn't matter for most use cases, and when it does, you are usually better off using other data structures.

If you want to lookups with faster performance, use dicts, they are likely to have O(1) complexity. For details refer to https://wiki.python.org/moin/TimeComplexity .

Upvotes: 2

Akavall
Akavall

Reputation: 86168

Checking if element is in a list or if element is not in a list calling the same operation x in my_list, so there should not be any difference.

Does the size of the list matter?

Checking if element is in a list is an O(N) operation, this means that the size does matter, roughly proportionately.

If you need to do checking a lot, you probably want to look into set, checking if an element is in a set is O(1), this means that checking time does not change much as size of set increases.

Upvotes: 4

Related Questions