Reputation: 5056
If I have a list of 10,000 words, what's an optimized way to check if a word is in that list that won't slow the app down to a crawl?
Should I load the words in from a file and check against that?
def check_for_word(word):
HUGE_LIST = [...] # 10,000 Words
if word in HUGE_LIST:
return True
else:
return False
Upvotes: 0
Views: 267
Reputation: 25023
You may want to take a slightly indirect route, writing a function that, given a file containing all your words, returns a function to check membership
def make_checker(fname):
with open(fname) as f:
# Hp: one word per line, you can adjust the code for a different format
# this line builds a set by a _set comprehension_
words = {word.strip() for word in f}
def the_checker(word):
return word in words
return the_checker
That you can use like this
check_4 = make_checker('corpus_of_4_letter_words.txt')
...
if check_4(answer.strip()):
print('Please, please do not use these words.')
Upvotes: 0
Reputation: 23203
Convert a list to set - strings are hashable so set can be easily created.
Look-up in set is O(1), where for list it's O(n), where n is a length of list.
HUGE_SET = set(HUGE_LIST) # or frozenset, if it's constant and words won't be added to it
return word in HUGE_SET
Also, consider moving creation of huge list and huge set outside of function body. Right now list is recreated every time function is called.
List timings:
$ python -m timeit -s "words = list(map(str, xrange(10000)))" -n 10000 "'5000' in words"
10000 loops, best of 3: 58.2 usec per loop
Frozenset timings:
$ python -m timeit -s "words = frozenset(map(str, xrange(10000)))" -n 10000 "'5000' in words"
10000 loops, best of 3: 0.0504 usec per loop
Upvotes: 4
Reputation: 31250
Read the words in from a file and turn them into a set
of words. Checking membership of a set is very fast (and 10,000 is not "very large" :-)).
with open('words.txt') as words:
wordset = {word.strip() for word in words}
return word in wordset
(though it would help if you didn't have to read it in each time, keep it around in a variable -- building the set each time takes longer than checking if a word is in it in your original way)
Upvotes: 1
Reputation: 48057
If you won't be doing any modification in the list, use tuple
instead of list.
If the items in the list are unique, it is even better to use set
then.
Lookup will be faster with tuple/set as compared to lookup in list
Upvotes: 2