Reputation: 4144
I'm wondering because I need to have have a function that is disgustingly fast at checking if a word is in a dictionary list - I'm considering leaving the dictionary as a large string and running regex against instead. This needs to be absurdly fast. So I just need a basic overview of how python handles checking if a string is in a list of strings and if its beyond-reasonable fast.
Upvotes: 3
Views: 310
Reputation: 29
Running a regex against your word list is a very bad idea; it scales very badly.
Using dict()
, set()
or frozenset()
will scale a lot better:
s = set(['one','two','three'])
'two' in s ## true
b='four'
b in s ## false
s.add('four')
b in s ## true
Upvotes: 0
Reputation: 5195
According to this site, x in s
is O(n). Therefore, it checks each entry (in the worst case).
At any rate, do not use a regex. Using sets or lists is a much more intuitive way to represent the data and regexes will not perform better than O(n).
Upvotes: 2
Reputation: 20912
Use a set. If you need case-insensitive checking, just store the words into the set downcased. Then when checking if a certain word is in the set, downcase the word before checking membership.
The general rule is: normalize entries when building the set, and normalize an item before checking against the set. Another example of normalization is collapsing consecutive whitespace chars into a single space and stripping leading/trailing whitespace.
Upvotes: 0
Reputation: 65864
If you want a blazingly fast membership test, then a list is the wrong data structure. Take a look at the implementation of list_contains
in listobject.c
, line 437. It iterates over the list in order, comparing the item with each element in turn. The later the item appears in the list, the longer it will take to find it, and if the item is missing, then the whole list must be scanned.
Use a set instead. Sets are implemented internally by a hash table, so looking up an object involves computing its hash and then scanning a few table entries (usually just one). For the particular case of looking up a string, see set_lookkey_string
in setobject.c
, line 156.
Upvotes: 10
Reputation: 987
You probably want to use a Set if you're worried about time. A Set is much like a list, but it checks for membership based on hashing.
Upvotes: 0
Reputation: 375754
A set of strings will have O(1) lookup time: effectively constant regardless of the size of the set. Making a set from your list of strings is easy:
my_set = set(my_list)
if my_word in my_set:
print "it's there!"
Upvotes: 4
Reputation: 288070
If you're using a regular list, consider a set
instead.
If you want to implement your own fine-tuned membership test for your container object, override __contains__
.
Upvotes: 1
Reputation: 117761
If you need real fast checking, use a set
:
words = set(words_list)
if "hello" in words:
print("hello found!"")
A set is faster because it uses a hash-algorithm, instead of a direct search approach.
Upvotes: 2