marcelosalloum
marcelosalloum

Reputation: 3531

Comparing strings with Python: X in ['item1', 'item2', ...] vs. X == 'item1', X == 'item2', ...?

I am fairly new to Python and I have a question regarding performance when comparing strings. Both codes below seem to achieve what I want but is there any reason to use one of them instead of the other?

Option1:

if first_column_header == 'author name' or first_column_header == 'author' or first_column_header == 'name':

Option2:

if first_column_header in ['author name', 'author', 'name']:

Upvotes: 3

Views: 225

Answers (3)

kindall
kindall

Reputation: 184211

If you have a lot of choices, say more than a dozen, and speed is really critical, then use a set. It's the fastest to check membership, although there's some overhead since the item being checked needs to be hashed. Define the set ahead of time so it's not redefined each time the if statement is executed.

first_column_names  = {"author name", "author", "name"}

# In Python before 2.7, you must use `set()` instead:
first_column_names = set(("author name", "author", "name"))

if first_column_header in first_column_names:

But if speed is critical, what are you writing it in Python for to begin with? :-) Generally you'll want to go with what's more readable. In this situation, that'll be a list or a tuple. Defining a tuple literal is marginally faster so, with readability being equal, I'd go that way:

if first_column_header in ("author name", "author", "name"):

Upvotes: 3

Paulo Bu
Paulo Bu

Reputation: 29794

Option 2 is definitely shorter and more pythonic. It's also possible that it adds a little more of overhead to your code because it creates a list and then iterates through it.

This is a trade off you'll have to accept by making programs more readable but, IMHO, this is too little overhead to worry so I'll go with Option 2.

Hope this helps!

Upvotes: 8

Bartlomiej Lewandowski
Bartlomiej Lewandowski

Reputation: 11180

I have to disagree with Paulo on the fact that Option 1 is faster. Here is what dis shows for these 2 functions:

def t():
    if a == 'author name' or a == 'author' or a == 'name':
        return True
    return False

def t2():
    if a in ['author name','author','name']:
        return True
    return False

It seems that a is loaded many times in the first case, and that the list in option 2 is created before the call.

3           0 LOAD_GLOBAL              0 (a) 
              3 LOAD_CONST               1 ('author name') 
              6 COMPARE_OP               2 (==) 
              9 POP_JUMP_IF_TRUE        36 
             12 LOAD_GLOBAL              0 (a) 
             15 LOAD_CONST               2 ('author') 
             18 COMPARE_OP               2 (==) 
             21 POP_JUMP_IF_TRUE        36 
             24 LOAD_GLOBAL              0 (a) 
             27 LOAD_CONST               3 ('name') 
             30 COMPARE_OP               2 (==) 
             33 POP_JUMP_IF_FALSE       40 

  4     >>   36 LOAD_CONST               4 (True) 
             39 RETURN_VALUE         

  5     >>   40 LOAD_CONST               5 (False) 
             43 RETURN_VALUE         
  8           0 LOAD_GLOBAL              0 (a) 
              3 LOAD_CONST               4 (('author name', 'author', 'name')) 
              6 COMPARE_OP               6 (in) 
              9 POP_JUMP_IF_FALSE       16 

  9          12 LOAD_CONST               5 (True) 
             15 RETURN_VALUE         

 10     >>   16 LOAD_CONST               6 (False) 
             19 RETURN_VALUE         

Upvotes: 1

Related Questions