Reputation: 195
I have a program that separates Portuguese words ending in a, as, e, es, o, os. I’ve created some lists and I loop through the file and assign the words from the file into these different lists based on their endings. Words that do not match the pattern are assigned to a list named “other”. Now, I want to separate all other remaining words based on their two last characters. I thought I could do the same I did before: for example, words ending in ‘em’ assign to a list named ‘em’, words ending in ‘ul’ assign to a list named ‘ul’ and so on. However, I would end up with a huge code because I’ve checked and there are 470 other endings! So, I would need to manually create 470 lists. Does anyone have any idea how I could do this automatically? Or any other solution to the problem? My code so far is below. Many thanks in advance!!
from nltk.tokenize import sent_tokenize,wordpunct_tokenize
import re
import os
import io
import sys
from pathlib import Path
while True:
try:
file_to_open =Path(input("Please, insert your file path: "))
with open(file_to_open,'r', encoding="utf-8") as f:
words = f.read().lower()
break
except FileNotFoundError:
print("\nFile not found. Better try again")
except IsADirectoryError:
print("\nIncorrect Directory path.Try again")
other=[]
e=[]
o=[]
a=[]
for y in words:
if y[-1:] == 'a'or y[-2:]=='as':
a.append(y)
elif y[-1:] == 'o' or y[-2:] =='os' :
o.append(y)
elif y[-1:] == 'e'or y[-2:]=='es':
e.append(y)
else:
other.append(y)
otherendings=[]
for t in other:
endings=t[-2:]
otherendings.append(endings)
print(len(otherendings))
print(set(otherendings)) #470
Upvotes: 0
Views: 87
Reputation: 987
Creating a dictionary where keys are word endings:
word_dict = {}
for word in words:
ending = word[-2:]
try:
word_dict[ending].append(word)
except:
word_dict[ending] = [word]
After the iteration over words, you will have a dictionary where keys will be strings consisting of two letters, and each key will contain a list of words ending with these two letters.
Upvotes: 1