Natalia Resende
Natalia Resende

Reputation: 195

separate words based on their two lasts characters in python

I have a program that separates Portuguese words ending in a, as, e, es, o, os. I’ve created some lists and I loop through the file and assign the words from the file into these different lists based on their endings. Words that do not match the pattern are assigned to a list named “other”. Now, I want to separate all other remaining words based on their two last characters. I thought I could do the same I did before: for example, words ending in ‘em’ assign to a list named ‘em’, words ending in ‘ul’ assign to a list named ‘ul’ and so on. However, I would end up with a huge code because I’ve checked and there are 470 other endings! So, I would need to manually create 470 lists. Does anyone have any idea how I could do this automatically? Or any other solution to the problem? My code so far is below. Many thanks in advance!!

from nltk.tokenize import sent_tokenize,wordpunct_tokenize
import re
import os
import io
import sys
from pathlib import Path

while True:
    try:
        file_to_open =Path(input("Please, insert your file path: "))
        with open(file_to_open,'r', encoding="utf-8") as f:
            words = f.read().lower()
            break         
    except FileNotFoundError:
        print("\nFile not found. Better try again")
    except IsADirectoryError:
        print("\nIncorrect Directory path.Try again")

other=[]

e=[]
o=[]
a=[]

for y in words:
    if y[-1:] == 'a'or y[-2:]=='as':
        a.append(y)
    elif y[-1:] == 'o' or y[-2:] =='os' :
        o.append(y)
    elif y[-1:] == 'e'or y[-2:]=='es':
        e.append(y)
    else:
        other.append(y)

otherendings=[]

for t in other:
    endings=t[-2:]
    otherendings.append(endings)

print(len(otherendings))
print(set(otherendings)) #470

Upvotes: 0

Views: 87

Answers (1)

abe
abe

Reputation: 987

Creating a dictionary where keys are word endings:

word_dict = {}
for word in words:
    ending = word[-2:]
    try: 
        word_dict[ending].append(word)
    except:
        word_dict[ending] = [word]

After the iteration over words, you will have a dictionary where keys will be strings consisting of two letters, and each key will contain a list of words ending with these two letters.

Upvotes: 1

Related Questions