Reputation: 31
I am attempting to write a program to find words in the English language that contain 3 letters of your choice, in order, but not necessarily consecutively. For example, the letter combination EJS
would output, among others, the word EJectS
. You supply the letters, and the program outputs the words.
However, the program does not give the letters in the right order, and does not work at all with double letters, like the letters FSF or VVC. I hope someone can tell me how I can fix this error.
Here is the full code:
with open("words_alpha.txt") as words:
wlist = list(words)
while True:
elim1 = []
elim2 = []
elim3 = []
search = input("input letters here: ")
for element1 in wlist:
element1 = element1[:-1]
val1 = element1.find(search[0])
if val1 > -1:
elim1.append(element1)
for element2 in elim1:
val2 = element2[(val1):].find(search[2])
if val2 > -1:
elim2.append(element2)
for element3 in elim2:
val3 = element3[((val1+val2)):].find(search[1])
if val3 > -1:
elim3.append(element3)
print(elim3)
Upvotes: 1
Views: 2801
Reputation: 1379
You need to read the file correctly with read()
, and since there is a newline between each word, call split('\n')
to properly create the word list. The logic is simple. If all the letters are in the word, get the index for each letter, and check that the order of the indexes matches the order of the letters.
with open('words_alpha.txt') as file:
word_list = file.read().split('\n')
search = input("input letters here: ").lower()
found = []
for word in word_list:
if all(x in word for x in search):
i = word.find(search[0])
j = word.find(search[1], i + 1)
k = word.find(search[2], j + 1)
if i < j < k:
found.append(word)
print(found)
Using Function:
def get_words_with_letters(word_list, search):
search = search.lower()
for word in word_list:
if all(x in word for x in search):
i = word.find(search[0])
j = word.find(search[1], i + 1)
k = word.find(search[2], j + 1)
if i < j < k:
yield word
words = list(get_words_with_letters('fsf'))
Upvotes: 1
Reputation: 11224
The issue with your code is that you're using val1
from a specific word in your first loop for another word in your second loop. So val1
will be the wrong value most of the time as you're using the position of the first letter in the last word you checked in your first loop for every word in your seconds loop.
There are a lot of ways to solve what you're trying to do. However, my code below should be fairly close to what you had in mind with your solution. I have tried to explain everything that's going on in the comments:
# Read words from file
with open("words_alpha.txt") as f:
words = f.readlines()
# Begin infinite loop
while True:
# Get user input
search = input("Input letters here: ")
# Loop over all words
for word in words:
# Remove newline characters at the end
word = word.strip()
# Start looking for the letters at the beginning of the word
position = -1
# Check position for each letter
for letter in search:
position = word[position + 1:].find(letter)
# Break out of loop if letter not found
if position < 0:
break
# If there was no `break` in the loop, the word contains all letters
else:
print(word)
For every new letter we start looking beginning at position + 1
where position
is the position of the previously found letter. (That's why we have to do position = -1
, so we start looking for the first letter at -1 + 1 = 0
.)
You should ideally move the removal of \n
outside of the loop, so you will have to do it once and not for every search. I just left it inside the loop for consistency with your code.
Also, by the way, there's no handling of uppercase/lowercase for now. So, for example, should the search for abc
be different from Abc
? I'm not sure, what you need there.
Upvotes: 0
Reputation: 51034
You are making this very complicated for yourself. To test whether a word contains the letters E, J and S in that order, you can match it with the regex E.*J.*S
:
>>> import re
>>> re.search('E.*J.*S', 'EJectS')
<_sre.SRE_Match object; span=(0, 6), match='EJectS'>
>>> re.search('E.*J.*S', 'JEt engineS') is None
True
So here's a simple way to write a function which tests for an arbitrary combination of letters:
import re
def contains_letters_in_order(word, letters):
regex = '.*'.join(map(re.escape, letters))
return re.search(regex, word) is not None
Examples:
>>> contains_letters_in_order('EJectS', 'EJS')
True
>>> contains_letters_in_order('JEt engineS', 'EJS')
False
>>> contains_letters_in_order('ABra Cadabra', 'ABC')
True
>>> contains_letters_in_order('Abra CadaBra', 'ABC')
False
If you want to test every word in a wordlist, it is worth doing pattern = re.compile(regex)
once, and then pattern.search(word)
for each word.
Upvotes: 4