Bruce
Bruce

Reputation: 35285

How to remove plurals in a list of nouns?

I have a list of strings:

['bill', 'simpsons', 'cosbys', 'cosby','bills','mango', 'mangoes']

What is the best to remove all the plurals from this list? So, I want the output to be:

['bill', 'simpsons', 'cosby','mango']

Upvotes: 6

Views: 6183

Answers (4)

Suzana
Suzana

Reputation: 4420

With the NodeBox Linguistics it only takes two lines:

import en
only_singulars = [w for w in noun_list if w == en.noun.singular(w)]

The library implements Conway's pluralization rules that consider all kinds of exceptional cases.

Upvotes: 5

Anthony Blake
Anthony Blake

Reputation: 5348

In general, the process is called `stemming', and there is a package called 'stemming' for python.

Used like so:

from stemming.porter2 import stem
stem("simpsons")

Stemming does more than just stem plurals, but you could modify the stemming package to only perform the plural stemming. Take a look at the source: http://tartarus.org/martin/PorterStemmer/python.txt

Upvotes: 6

Kendall Lister
Kendall Lister

Reputation: 335

This is not possible unless extra information is supplied. For example, will all strings in your list be English words? Will they be nouns? If so, there appear to be several stemming packages for Python that presumably do a good job in most cases, but you will have more success the more strictly you can define your requirements. And if the list is created from user input, the user might not agree with the results of your processing; consider "octopi", "indices", et cetera.

Upvotes: -1

Raymond Hettinger
Raymond Hettinger

Reputation: 226674

Pluralization rules have many corner cases. Perhaps you can bypass a rules based approach and use a dictionary lookup to identify the plural form and singular form of a word.

Upvotes: 1

Related Questions