Jone Mamni
Jone Mamni

Reputation: 223

words stemmer class c#

I am trying the following stemming class :

static class StemmerSteps
{
    public static string stepSufixremover(this string str, string suffex)
    {
        if (str.EndsWith(suffex))
        {
            ................
        }
        return str;
    } 

    public static string stepPrefixemover(this string str, string prefix)
    {
        if (str.StartsWith(prefix) 
        {
            .....................
        }
        return str;
    }
}

this class works with one prefix or suffix. is there any suggestion to allow a list of prefixes or suffixes to go through the class and compare against each (str). your kind action really appreciated.

Upvotes: 1

Views: 3422

Answers (4)

Victor Stoddard
Victor Stoddard

Reputation: 3721

The simplest code would involve regular expressions.

For example, this would identify some English suffixes:

'^(.*?)(ing|ly|ed|ious|ies|ive|es|s|ment)?$'

One problem is that stemming is not as accurate as lemmatization. Lematization would require POS tagging for accuracy. For example, you don't want to add an -ing suffix to dove if it's a noun.

Another problem is that some suffixes also require prefixes. For example, you must add en- to -rich- to add a -ment suffix in en-rich-ment -- unlike a root like -govern- where you can add the suffix without any prefix.

Upvotes: 0

Tigran
Tigran

Reputation: 62276

EDIT

Considering your comment:

"just want to look if the string starts-/endswith any of the passed strings"

may be something like this can fit your needs:

public static string stepSufixremover(this string str, IEnumerable<string> suffex)
{           
   string suf = suffex.Where(x=>str.EndsWith(x)).SingleOrDefault();
   if(!string.IsNullOrEmpty(suf))
   {            
    str = str.Remove(str.Length - suf.Length, suf.Length);
   }
   return str;
} 

If you use this like:

"hello".stepone(new string[]{"lo","l"}).Dump();

it produces:

hel

Upvotes: 0

Ulises
Ulises

Reputation: 13429

Instead of creating your own class from scratch (unless this is homework) I would definitive use an existing library. This answer provides an example of code that that implements the Porter Stemming Algorithm:

https://stackoverflow.com/questions/7611455/how-to-perform-stemming-in-c

Upvotes: 2

mellamokb
mellamokb

Reputation: 56779

Put your suffix/prefixes in a collection (like a List<>), and loop through and apply each possible one. This collection would need to be passed into the method.

List<string> suffixes = ...;
for (suffix in suffixes)
    if (str.EndsWith(suffix))
        str = str.Remove(str.Length - suffix.Length, suffix.Length);

Upvotes: 0

Related Questions