Reputation: 4385
I have looked at pattern.en
's conjugate
, but it only conjugates into a few forms, and I would rather not have to sit down and program all of the exceptions to those rules that would allow me to make conjugations such as
nltk
has stemming, but it doesn't seem to have the reverse operation, at least from searching StackOverflow. This seems like a very elementary NLP task, but I cannot find anything modern that does this in Python. Any general conjugation tool would be nice, although the progressive form in English doesn't have irregularities I know of.
I am also trying to see if there are exceptions to this rule, which might work as an alternate function:
def present_to_progressive(x):
vowels = set(['a','e','i','o','u'])
size = len(x)
if size == 2:
return x + 'ing'
elif x[size - 2:] == 'ie':
return x[:(size-2)] + 'ying'
elif x[size - 1] not in vowels and x[size - 2] not in vowels:
return x + 'ing'
elif x[size - 1] == 'e' and x[size-2] not in vowels:
return x[0:(size-1)] + 'ing'
elif x[size - 1] not in vowels and x[size-2] in vowels:
if x[size - 3] not in vowels:
return x + x[size-1] + 'ing'
else:
return x + 'ing'
else:
return x + 'ing'
Edit: Added case for "ie" verbs
Upvotes: 3
Views: 1622
Reputation: 4993
There is an entire library for this type of modification that does what you want. It is called pattern.en
you can find it here: pattern.en
It is a good source.
Here is an excerpt from the conjugation tutorial on the site:
conjugate(verb,
tense = PRESENT, # INFINITIVE, PRESENT, PAST, FUTURE
person = 3, # 1, 2, 3 or None
number = SINGULAR, # SG, PL
mood = INDICATIVE, # INDICATIVE, IMPERATIVE, CONDITIONAL, SUBJUNCTIVE
aspect = IMPERFECTIVE, # IMPERFECTIVE, PERFECTIVE, PROGRESSIVE
negated = False, # True or False
parse = True)
It is quite useful and very expansive!
Upvotes: 3
Reputation: 1430
I think your code covers most cases. I checked with a list of 620 irregular verbs taken from this site and it misses approximately 84 cases.
with open('/tmp/Verblist.vrb', 'rt') as f:
err = 0
for l in f:
if l.startswith('>'):
forms = l[1:].split(' ')
guess = present_to_progressive(forms[0])
if forms[4].lower() != guess.lower():
print('CHECK: {} {} {}'.format(forms[0], forms[4], guess))
err += 1
print(err)
Just by adding 'w','y'
to your list of vowels, the list of possible mistakes goes down to 18 cases:
CHECK: Aby/Abey Abying/Abeying Aby/Abeying -- Correct
CHECK: Eat Eating Eatting
CHECK: Fordo/Foredo Fordoing Fordo/Foredoing -- Correct in one of the 2 variants
CHECK: Forget Foregetting Forgetting -- Correct, the list has a typo
CHECK: Lie Lying Lieing -- Fixed in your second version
CHECK: Mischoose Mischoosins Mischoosing -- Correct, the list has a typo
CHECK: Miswed Miswedding Misweding
CHECK: Outswim Outswimming Outswiming
CHECK: Overlie Overlying Overlieing -- Fixed in your second version
CHECK: Quit Quitting Quiting
CHECK: Relearn Relearn Relearning
CHECK: Rewed Rewedding Reweding
CHECK: Rewet Rewetting Reweting
CHECK: Rewin Rewinning Rewining
CHECK: Swim Swimming Swiming
CHECK: Underlie Underlying Underlieing -- Fixed in your second version
CHECK: Vex Vexing Vexxing
CHECK: Zinc Zincking Zincing
The most important of these could be addressed adding the special case "lie" and improving the rule on doubling the last consonant. I guess you may decide to safely ignore some very uncommon verbs.
Upvotes: 1