Reputation: 11
I need to create a word list from a text file. The list is going to be used in a hangman code and needs to exclude the following from the list:
the word list then needs to be output into file so that every word appears on its own line. The program also needs to output the number of words in the final list.
This is what I have, but it's not working properly.
def MakeWordList():
infile=open(('possible.rtf'),'r')
whole = infile.readlines()
infile.close()
L=[]
for line in whole:
word= line.split(' ')
if word not in L:
L.append(word)
if len(word) in range(5,100):
L.append(word)
if not word.endswith('xx'):
L.append(word)
if word == word.lower():
L.append(word)
print L
MakeWordList()
Upvotes: 1
Views: 1470
Reputation: 10489
You're appending the word many times with this code,
You arn't actually filtering out the words at all, just adding them a different number of timed depending on how many if
's they pass.
you should combine all the if
's:
if word not in L and len(word) >= 5 and not 'xx' in word and word.islower():
L.append(word)
Or if you want it more readable you can split them:
if word not in L and len(word) >= 5:
if not 'xx' in word and word.islower():
L.append(word)
But don't append after each one.
Upvotes: 2
Reputation: 17532
Improved code:
def MakeWordList():
with open('possible.rtf','r') as f:
data = f.read()
return set([word for word in data if len(word) >= 5 and word.islower() and not 'xx' in word])
set(_iterable_)
returns a set-type object that has no duplicates (all set
items must be unique). [word for word...]
is a list comprehension which is a shorter way of creating simple lists. You can iterate over every word in 'data' (this assumes each word is on a separate line). if len(word) >= 5 and word.islower() and not 'xx' in word
accomplishes the final three requirements (must be more than 5 letters, have only lowercase letters, and cannot contain 'xx').
Upvotes: 0
Reputation: 3524
Think about it: in your nested if-statements, ANY word that is not already in the list will make it through on your first line. Then if it is 5 or more characters, it will get added again (I bet), and again, etc. You need to rethink your logic in the if statements.
Upvotes: 0