torabi
torabi

Reputation: 51

problems in verb stemming in python

I want to find stems of verbs. I put the suffixes that I want to delete in a variable. the problem is that it just deletes the first item in the list and doesn't for the rest of the items and returns the verb without stemming. How should I change it that it can read all the items in the list?

def stemming (verb):
    suffix=["ing", "ed", "es", "s"]
    for i in suffix:
        stem=verb.replace(i, "")
        return stem
        i+=1

>>> stemming ("wanting")
'want'
>>> stemming ("wanted")
'wanted'

Upvotes: 2

Views: 758

Answers (5)

Juergen
Juergen

Reputation: 12738

You put the return statement inside the loop, that creates the wrong behavior. I think you meant something like that:

def stemming (verb):
    suffix=["ing", "ed", "es", "s"]
    stem = verb
    for i in suffix:
        stem=stem.replace(i, "")
    return stem

I also removed the i++, which is in deed useless. The other thing of course is, that you must operate always on the same variable to catch all the changes (either stem or verb -- I used stem for more clarity, but you can also use verb and get rid of the assignment).

As one commentator pointed out, your algorithm produces some weird behavior on some verbs.

I would suggest to change it this way:

def stemming (verb):
    suffixes = ["ing", "ed", "es", "s"]
    stem = verb
    for suffix in suffixes:
        if stem.endswith(suffix):
            stem = stem[:-len(suffix)]
            break
    return stem

With this change, only one suffix should be removed (because of the break) and also the removal will only take place at the end of the verb.

Upvotes: 2

James Mills
James Mills

Reputation: 19050

Because you return too early. The very moment Python encounters a return inside a closing function; it will return immediately to the callee (the function that called stemming).

Change your function stemming to:

def stemming (verb):
    suffixs = ["ing", "ed", "es", "s"]
    for suffix in suffixs:
        stem = verb.replace(suffix, "")
    return stem  # XXX: Moving the return outside of the loop

You also do not need to increment i here; it's not even an integer anyway; you're iterating over a list of strings. Each iteration over suffixes (I renamed some variables to be more readable) will in turn be the strings as specified in the list.


If you really wanted to use a counter variant to index suffixes:

def stemming (verb):
    suffixs = ["ing", "ed", "es", "s"]
    i = 0
    while i < len(suffixes):
        stem = verb.replace(suffixs[i], "")
        i += 1
    return stem

However; this is really unnecessary as you can just use normal and more Pythonic iteration over the list: for suffix in suffixes:


I also believe your function is also meant to be:

Code:

def stemming(verb):
    suffixs = ["ing", "ed", "es", "s"]
    for suffix in suffixs:
        verb = verb.replace(suffix, "")
    return verb

Output:

>>> stemming("singing")
''

Think about it! :)


Also as an aside; you really should be using nltk for stemming anyway unless you are just doing this for educational purposes.

See: nltk.stem

Example:

>>> from nltk.stem.lancaster import LancasterStemmer
>>> st = LancasterStemmer()
>>> st.stem("singing")
'sing'  # NOT an empty string!!!
>>> st.stem("wanting")
'want'
>>> st.stem("wanted")
'want'

Upvotes: 4

Delimitry
Delimitry

Reputation: 3037

Move return out of the loop and remove i += 1. It is useless here:

def stemming(verb):
    suffix=["ing", "ed", "es", "s"]
    for i in suffix:
        verb=verb.replace(i, "")
    return verb

Upvotes: 3

khelwood
khelwood

Reputation: 59185

As soon as your function returns, it is finished. It doesn't carry on going through the loop replacing more stuff. I think what you actually want is something more like this:

def stemming(verb):
    suffixes = ["ing", "ed", "es", "s"]
    for suffix in suffixes:
        if verb.endswith(suffix):
            return verb[:-len(suffix)]
    return verb

So it actually checks if the verb ends in each suffix, and doesn't return unless it finds one that does.

Upvotes: 2

Emil Vikstr&#246;m
Emil Vikstr&#246;m

Reputation: 91983

return will always end the function and return to where you called it. Use yield instead of return if you want to generate multiple values.

As a side note, remove your incrementation of i because i is not an integer in your code.

Upvotes: 1

Related Questions