Reputation: 1
I am looking to write a function that takes a list of words (wordlist), opens a txt file and returns a list of words that don't appear in the txt file. This is what I have so far...
def check_words_in_file(wordlist):
"""Return a list of words that don't appear in words.txt"""
words = set()
words = open("words.txt").read().splitlines()
return [x for x in wordlist if x not in words]
The problem I am having with this function is that it is too slow. If I use a wordlist consisting of say 10,000 words, it takes about 15 seconds to complete. If I use one with 300,000 it takes way longer than it should. Is there anyway I can make this function faster?
Upvotes: 0
Views: 79
Reputation: 63707
The problem is with your understanding of usage of variables and associating with objects, which is evident when you write
words = set()
words = open("words.txt").read().splitlines()
In the first line, you initially create an empty set object and associate the reference of it with the variable words
. Later you open the file and split the lines of it content, which returns a list and rebind the variable words
with the list
You probably intended to write
words = set(open("words.txt").read().splitlines())
Further improvement
You can actually do better, if you create a set of the arguments wordlist
and find an asymmetric set difference
words = set(wordlist).difference(open("words.txt").read().splitlines())
return list(words)
Nitpick
It is generally not advised to open a file and let the file handle be garbage collected. Either close the file or use a context manager
with open("words.txt") as fin:
from itertools import imap
words = set(wordlist).difference(imap(str.strip, fin))
return list(words)
Upvotes: 7