rereborn
rereborn

Reputation: 11

separating only suffixes from string

I have a list of word suffixes, my aim is to separate the entered sentence into suffixes in the list.

My problem is that the suffixes in this list separate the words even at the root. For instance:

(internationally) >> should be >> (interna _tion _al _ly), my code's output is >> (int _erna _tion _al _ly)

Note: I have "er" in my list

One solution could be to search for words starting from the end of the sentence. For example, the code first adds the letter "y" if it matches the list, separates it, if it doesn't it continues to add > "ly" separates because it matches, then resets and continues "l" > "al" and separates it and continues. If it continues like this, "erna" won't match and split.

If it searches this way the problem goes away but I couldn't find how to do it.

I would be very happy if you show me the way.

sentence = input()
suffixes = ["acy", "ance", "ence", "dom", "er", "or", "ism", "ist",
         "ty", "ment", "ness", "ship", "sion", "tion", "ate",
        "en", "fy", "ize", "able", "ible", "al",
        "esque", "ful", "ic", "ous", "ish", "ive",
        "less", "ed", "ing", "ly", "ward", "wise"]

for x in suffixes:
    y = " _" + x
    sentence = sentence.replace(x, y)

Upvotes: 0

Views: 880

Answers (4)

j1-lee
j1-lee

Reputation: 13939

Here is a way using endswith() and string slicing:

suffixes = ["acy", "ance", "ence", "dom", "er", "or", "ism", "ist",
            "ty", "ment", "ness", "ship", "sion", "tion", "ate",
            "en", "fy", "ize", "able", "ible", "al",
            "esque", "ful", "ic", "ous", "ish", "ive",
            "less", "ed", "ing", "ly", "ward", "wise"]

def find_suffix(word):
    for suffix in suffixes:
        if word.endswith(suffix):
            suffix_removed = word[:-len(suffix)] # part before suffix
            return find_suffix(suffix_removed) + f' _{suffix}' # recurse
    return word # if no suffix is found, return the word as is

print(find_suffix('internationally')) # interna _tion _al _ly
print(find_suffix('egoistically')) # ego _ist _ic _al _ly

Recursion is not essential; the same can be done just with a for loop.

In Python 3.9, they introduced a method removesuffix() for string, which is defined basically in the same way as the code above. If you are using Python 3.9+, you can instead use suffix_removed = word.removesuffix(suffix) for readability (although I have not tested this since I use 3.8).


Per OP's request, the following is a function that applies the above to each word in a sentence.

def suffixify_sentence(sentence):
    return ' '.join(find_suffix(word) for word in sentence.split())

sentence = 'humanity internationally faithfully picturesque'
print(suffixify_sentence(sentence)) # humani _ty interna _tion _al _ly faith _ful _ly pictur _esque

Upvotes: 2

wjandrea
wjandrea

Reputation: 33022

str.replace() is the problem. It replaces the substring anywhere, not just at the end. Instead you can use str.endswith() or if you're using 3.9+, str.removesuffix().

Here's an iterative implementation using str.endswith().

def remove_suffixes(string, suffixes):
    """
    Remove all suffixes from string. Return the root and suffixes.

    >>> remove_suffixes('smartly', ['y', 'ly'])
    ('smart', ['ly'])
    """
    # Sort to ensure the longest ones match first
    suffixes = sorted(suffixes, key=len, reverse=True)
    removed = []
    prev = None  # Loop variable
    while prev != string:  # i.e. break if unchanged
        prev = string  # Copy for next loop
        for suffix in suffixes:
            if string.endswith(suffix):
                removed.append(suffix)
                string = string[:-len(suffix)]
    return string, removed[::-1]

suffixes = [
    "acy", "ance", "ence", "dom", "er", "or", "ism", "ist",
    "ty", "ment", "ness", "ship", "sion", "tion", "ate",
    "en", "fy", "ize", "able", "ible", "al",
    "esque", "ful", "ic", "ous", "ish", "ive",
    "less", "ed", "ing", "ly", "ward", "wise"]

s_out, found = remove_suffixes('internationally', suffixes)
# > 'interna', ['tion', 'al', 'ly']
print(s_out, *found, sep=' _')  # -> interna _tion _al _ly

Upvotes: 2

fthomson
fthomson

Reputation: 789

I'm not sure if your algorithm will work in all cases but it seemed fun to implement so here it is

sentence = 'internationally'
sentence = list(sentence)
stack = []
results = []
for i in sentence[::-1]:
    stack.insert(0,i)
    guess = ''.join(stack)
    if guess in suffixes:
        results.insert(0, f'_{guess}')
        stack = []

results.insert(0, guess)
    
print(''.join(results))
# interna_tion_al_ly    

you essentially implement a stack and build it backwards

Upvotes: 1

Acccumulation
Acccumulation

Reputation: 3591

You can do

max_length = max(len(suffix) for suffix in suffixes)
for suffix_length in range(max_length):
    if suffix_length >= len(word):
        break
    if word[-suffix_length:] in suffixes:
        #split suffix

Another tactic is to iterate through the suffixes in increasing length. You can do this by having suffixes = sorted(suffixes, key = len) before iterating through the suffixes. I.e.:

sentence = input()
suffixes = ["acy", "ance", "ence", "dom", "er", "or", "ism", "ist",
     "ty", "ment", "ness", "ship", "sion", "tion", "ate",
    "en", "fy", "ize", "able", "ible", "al",
    "esque", "ful", "ic", "ous", "ish", "ive",
    "less", "ed", "ing", "ly", "ward", "wise"]

suffixes = sorted(suffixes, key = len)
for x in suffixes:
    y = " _" + x
    sentence = sentence.replace(x, y)

Upvotes: 0

Related Questions