Reputation: 63
I want to join two words separated by an asterisk (*) in a list of French words. After joining these words I want to check if this word exists in a French dictionary. If so, the concatenated word should remain in the list, if not it should be appended to another list. I have used yield (I'm new to this function) in my code but there is something wrong with my nested if/else loop. Can anyone help me to accomplish my goal? My unsuccessful code is below:
words = ['Bien', '*', 'venue', 'pour', 'les','engage', '*', 'ment','trop', 'de', 'YIELD', 'peut','être','contre', '*', 'productif' ]
with open ('Fr-dictionary.txt') as fr:
dic = word_tokenize(fr.read().lower())
l=[ ]
def join_asterisk(ary):
i, size = 0, len(ary)
while i < size-2:
if ary[i+1] == '*':
if ary[i] + ary[i+2] in dic:
yield ary[i] + ary[i+2]
i+=2
else: yield ary[i]
i+=1
l.append(ary[i] + ary[i+2])
if i < size:
yield ary[i]
print(list(join_asterisk(words)))
Upvotes: 1
Views: 5757
Reputation: 10952
Generators are perfect for this use case, the way you can think about a generator is as a function that will give you the yielded values one by one instead of all at once (as return does). In other word, you can see it as a list that is not in memory, a list for which you'll get the next element only when asked for it. Also remark that generators are just one way of building iterators.
What that mean in your case is that you don't have to build a list l
to keep track of the correct word as the generator join_asterisk
will yield the correct words for you. What you need to do is to iterate over all the values that this generator will yield. That's exactly what list(generator)
will do, it will build a list by iterating over all values of your generator.
In the end the code would look like this:
# That look better to me (just in case you change it later)
word_separator = '*'
words = ['Bien', word_separator, 'venue', 'pour', 'les','engage', word_separator, 'ment','trop', 'de', 'YIELD', 'peut', word_separator, "tard"]
# Fake dictionary
dic = {"Bienvenue", "pour", "les", "engagement", "trop", "de", "peut", "peut-être"}
def join_asterisk(ary):
for w1, w2, w3 in zip(words, words[1:], words[2:]):
if w2 == word_separator:
word = w1 + w3
yield (word, word in dic)
elif w1 != word_separator and w1 in dic:
yield (w1, True)
correct_words = []
incorrect_words = []
for word, is_correct in join_asterisk(words):
if is_correct:
correct_words.append(word)
else:
incorrect_words.append(word)
print(correct_words)
print(incorrect_words)
This outputs
['Bienvenue', 'pour', 'les', 'engagement', 'trop', 'de']
['peuttard']
Also note that you can make use of list comprehension instead of using a for loop to fill the two lists:
correct_words = [w for w, correct in join_asterisk(words) if correct]
incorrect_words = [w for w, correct in join_asterisk(words) if not correct]
Upvotes: 3
Reputation: 721
Aren't you looking for something like this:
def join_asterisk(ary):
i, size = 0, len(ary)
while i < size-2:
if ary[i+1] == '*':
if ary[i] + ary[i+2] in dic:
yield ary[i] + ary[i+2]
i+=2
else:
yield ary[i]
i+=1
l.append(ary[i] + ary[i+2])
if i < size:
yield ary[i]
the 'else' block follows the same rules.
Adding an expression in the same line of an 'if', 'elif', 'else' or 'while' clause for example works, but if you want more than on expression associated with clause you have to use indentation or separate the expressions with ';' like this:
while 1:print(9,end='');print(8)
Upvotes: 0
Reputation: 61
It seems like the lines:
i+=1
l.append(ary[i] + ary[i+2])
are not indented enough and are therefore not involved in the else
. This means that every pair of words with a * inbetween will be appeneded to l
instead of just the pairs that aren't in dic
.
Upvotes: 0