Jon Phenow
Jon Phenow

Reputation: 4144

How does Python handle checking 'if object in list'

I'm wondering because I need to have have a function that is disgustingly fast at checking if a word is in a dictionary list - I'm considering leaving the dictionary as a large string and running regex against instead. This needs to be absurdly fast. So I just need a basic overview of how python handles checking if a string is in a list of strings and if its beyond-reasonable fast.

Upvotes: 3

Views: 310

Answers (8)

Kalleguld
Kalleguld

Reputation: 29

Running a regex against your word list is a very bad idea; it scales very badly. Using dict(), set() or frozenset() will scale a lot better:

s = set(['one','two','three'])
'two' in s     ## true

b='four'
b in s         ## false

s.add('four')
b in s         ## true

Upvotes: 0

David Weiser
David Weiser

Reputation: 5195

According to this site, x in s is O(n). Therefore, it checks each entry (in the worst case).

At any rate, do not use a regex. Using sets or lists is a much more intuitive way to represent the data and regexes will not perform better than O(n).

Upvotes: 2

Kelvin
Kelvin

Reputation: 20912

Use a set. If you need case-insensitive checking, just store the words into the set downcased. Then when checking if a certain word is in the set, downcase the word before checking membership.

The general rule is: normalize entries when building the set, and normalize an item before checking against the set. Another example of normalization is collapsing consecutive whitespace chars into a single space and stripping leading/trailing whitespace.

Upvotes: 0

Gareth Rees
Gareth Rees

Reputation: 65864

If you want a blazingly fast membership test, then a list is the wrong data structure. Take a look at the implementation of list_contains in listobject.c, line 437. It iterates over the list in order, comparing the item with each element in turn. The later the item appears in the list, the longer it will take to find it, and if the item is missing, then the whole list must be scanned.

Use a set instead. Sets are implemented internally by a hash table, so looking up an object involves computing its hash and then scanning a few table entries (usually just one). For the particular case of looking up a string, see set_lookkey_string in setobject.c, line 156.

Upvotes: 10

Abe Schneider
Abe Schneider

Reputation: 987

You probably want to use a Set if you're worried about time. A Set is much like a list, but it checks for membership based on hashing.

Upvotes: 0

Ned Batchelder
Ned Batchelder

Reputation: 375754

A set of strings will have O(1) lookup time: effectively constant regardless of the size of the set. Making a set from your list of strings is easy:

my_set = set(my_list)
if my_word in my_set:
    print "it's there!"

Upvotes: 4

phihag
phihag

Reputation: 288070

If you're using a regular list, consider a set instead.

If you want to implement your own fine-tuned membership test for your container object, override __contains__.

Upvotes: 1

orlp
orlp

Reputation: 117761

If you need real fast checking, use a set:

words = set(words_list)
if "hello" in words:
    print("hello found!"")

A set is faster because it uses a hash-algorithm, instead of a direct search approach.

Upvotes: 2

Related Questions