Reputation: 11
I have a list of word suffixes, my aim is to separate the entered sentence into suffixes in the list.
My problem is that the suffixes in this list separate the words even at the root. For instance:
(internationally) >> should be >> (interna _tion _al _ly), my code's output is >> (int _erna _tion _al _ly)
Note: I have "er" in my list
One solution could be to search for words starting from the end of the sentence. For example, the code first adds the letter "y" if it matches the list, separates it, if it doesn't it continues to add > "ly" separates because it matches, then resets and continues "l" > "al" and separates it and continues. If it continues like this, "erna" won't match and split.
If it searches this way the problem goes away but I couldn't find how to do it.
I would be very happy if you show me the way.
sentence = input()
suffixes = ["acy", "ance", "ence", "dom", "er", "or", "ism", "ist",
"ty", "ment", "ness", "ship", "sion", "tion", "ate",
"en", "fy", "ize", "able", "ible", "al",
"esque", "ful", "ic", "ous", "ish", "ive",
"less", "ed", "ing", "ly", "ward", "wise"]
for x in suffixes:
y = " _" + x
sentence = sentence.replace(x, y)
Upvotes: 0
Views: 880
Reputation: 13939
Here is a way using endswith()
and string slicing:
suffixes = ["acy", "ance", "ence", "dom", "er", "or", "ism", "ist",
"ty", "ment", "ness", "ship", "sion", "tion", "ate",
"en", "fy", "ize", "able", "ible", "al",
"esque", "ful", "ic", "ous", "ish", "ive",
"less", "ed", "ing", "ly", "ward", "wise"]
def find_suffix(word):
for suffix in suffixes:
if word.endswith(suffix):
suffix_removed = word[:-len(suffix)] # part before suffix
return find_suffix(suffix_removed) + f' _{suffix}' # recurse
return word # if no suffix is found, return the word as is
print(find_suffix('internationally')) # interna _tion _al _ly
print(find_suffix('egoistically')) # ego _ist _ic _al _ly
Recursion is not essential; the same can be done just with a for
loop.
In Python 3.9, they introduced a method removesuffix()
for string, which is defined basically in the same way as the code above. If you are using Python 3.9+, you can instead use suffix_removed = word.removesuffix(suffix)
for readability (although I have not tested this since I use 3.8).
Per OP's request, the following is a function that applies the above to each word in a sentence.
def suffixify_sentence(sentence):
return ' '.join(find_suffix(word) for word in sentence.split())
sentence = 'humanity internationally faithfully picturesque'
print(suffixify_sentence(sentence)) # humani _ty interna _tion _al _ly faith _ful _ly pictur _esque
Upvotes: 2
Reputation: 33022
str.replace()
is the problem. It replaces the substring anywhere, not just at the end. Instead you can use str.endswith()
or if you're using 3.9+, str.removesuffix()
.
Here's an iterative implementation using str.endswith()
.
def remove_suffixes(string, suffixes):
"""
Remove all suffixes from string. Return the root and suffixes.
>>> remove_suffixes('smartly', ['y', 'ly'])
('smart', ['ly'])
"""
# Sort to ensure the longest ones match first
suffixes = sorted(suffixes, key=len, reverse=True)
removed = []
prev = None # Loop variable
while prev != string: # i.e. break if unchanged
prev = string # Copy for next loop
for suffix in suffixes:
if string.endswith(suffix):
removed.append(suffix)
string = string[:-len(suffix)]
return string, removed[::-1]
suffixes = [
"acy", "ance", "ence", "dom", "er", "or", "ism", "ist",
"ty", "ment", "ness", "ship", "sion", "tion", "ate",
"en", "fy", "ize", "able", "ible", "al",
"esque", "ful", "ic", "ous", "ish", "ive",
"less", "ed", "ing", "ly", "ward", "wise"]
s_out, found = remove_suffixes('internationally', suffixes)
# > 'interna', ['tion', 'al', 'ly']
print(s_out, *found, sep=' _') # -> interna _tion _al _ly
Upvotes: 2
Reputation: 789
I'm not sure if your algorithm will work in all cases but it seemed fun to implement so here it is
sentence = 'internationally'
sentence = list(sentence)
stack = []
results = []
for i in sentence[::-1]:
stack.insert(0,i)
guess = ''.join(stack)
if guess in suffixes:
results.insert(0, f'_{guess}')
stack = []
results.insert(0, guess)
print(''.join(results))
# interna_tion_al_ly
you essentially implement a stack and build it backwards
Upvotes: 1
Reputation: 3591
You can do
max_length = max(len(suffix) for suffix in suffixes)
for suffix_length in range(max_length):
if suffix_length >= len(word):
break
if word[-suffix_length:] in suffixes:
#split suffix
Another tactic is to iterate through the suffixes in increasing length. You can do this by having suffixes = sorted(suffixes, key = len)
before iterating through the suffixes. I.e.:
sentence = input()
suffixes = ["acy", "ance", "ence", "dom", "er", "or", "ism", "ist",
"ty", "ment", "ness", "ship", "sion", "tion", "ate",
"en", "fy", "ize", "able", "ible", "al",
"esque", "ful", "ic", "ous", "ish", "ive",
"less", "ed", "ing", "ly", "ward", "wise"]
suffixes = sorted(suffixes, key = len)
for x in suffixes:
y = " _" + x
sentence = sentence.replace(x, y)
Upvotes: 0