Reputation: 195
Similar question to this [1]porter stemming algorithm implementation question?, but expanded.
Basically, step1b is defined as:
Step1b
`(m>0) EED -> EE feed -> feed agreed -> agree (*v*) ED -> plastered -> plaster bled -> bled (*v*) ING -> motoring -> motor sing -> sing `
My question is why does feed
stem to feed
and not fe
? All the online Porter Stemmer's I've tried online stems to feed
, but from what I see, it should stem to fe
.
My train of thought is:
`feed` does not pass through `(m>0) EED -> EE` as measure of `feed` minus suffix `eed` is `m(f)`, hence `=0`
`feed` will pass through `(*v*) ED ->`, as there is a vowel in the stem `fe` once the suffix `ed` is removed. So will stem at this point to `fe`
Can someone explain to me how online Porter Stemmers manage to stem to feed
?
Thanks.
Upvotes: 1
Views: 642
Reputation: 21
It's really sad that nobody here actually read the question. This is why feed
doesn't get stemmed to fe
by rule 2 of step 1b:
The definition of the algorithm states:
In a set of rules written beneath each other, only one is obeyed, and this will be the one with the longest matching S1 for the given word.
It isn't clearly statet that the conditions are always ignored here, but they are. So feed
does match to the first rule (but it isn't applied since the condition isn't met) and therefore the rest of the rules in 1b are ignored.
The code would approximately look like this:
// 1b
if(word.ends_with("eed")) { // (m > 0) EED -> EE
mval = getMvalueOfStem();
if(mval > 0) {
word.erase("d");
}
}
else if(word.ends_with("ed")) { // (*v*) ED -> NULL
if(containsVowel(wordStem) {
word.erase("ed");
}
}
else if(word.ends_with("ing")) { // (*v*) ING -> NULL
if(containsVowel(wordStem) {
word.erase("ing");
}
}
The important things here are the else if
s.
Upvotes: 0
Reputation: 1
In feed m refers to vowel,consonant pair. there is no such pair.
But in agreed "VC" is ag. Hence it is replaced by agree. The condition is m>0.
Here m=0.
Upvotes: 0
Reputation: 1
The rules for removing a suffix will be given in the form (condition) S1 -> S2 This means that if a word ends with the suffix S1, and the stem before S1 satisfies the given condition, S1 is replaced by S2. The condition is usually given in terms of m, e.g. (m > 1) EMENT -> Here S1 is `EMENT' and S2 is null. This would map REPLACEMENT to REPLAC, since REPLAC is a word part for which m = 2. now, in your example : (m>0) EED -> EE feed -> feed before 'EED', are there vowel(s) followed by constant(s), repeated more than zero time?? answer is no, befer 'EED' is "F", there are not vowel(s) followed by constant(s)
Upvotes: 0
Reputation: 11
It's because "feed" doesn't have a VC (vowel/consonant) combination, therefore m = 0. To remove the "ed" suffix, m > 0 (check the conditions for each step).
Upvotes: 0