Reputation: 35285
I have a list of strings:
['bill', 'simpsons', 'cosbys', 'cosby','bills','mango', 'mangoes']
What is the best to remove all the plurals from this list? So, I want the output to be:
['bill', 'simpsons', 'cosby','mango']
Upvotes: 6
Views: 6183
Reputation: 4420
With the NodeBox Linguistics it only takes two lines:
import en
only_singulars = [w for w in noun_list if w == en.noun.singular(w)]
The library implements Conway's pluralization rules that consider all kinds of exceptional cases.
Upvotes: 5
Reputation: 5348
In general, the process is called `stemming', and there is a package called 'stemming' for python.
Used like so:
from stemming.porter2 import stem
stem("simpsons")
Stemming does more than just stem plurals, but you could modify the stemming package to only perform the plural stemming. Take a look at the source: http://tartarus.org/martin/PorterStemmer/python.txt
Upvotes: 6
Reputation: 335
This is not possible unless extra information is supplied. For example, will all strings in your list be English words? Will they be nouns? If so, there appear to be several stemming packages for Python that presumably do a good job in most cases, but you will have more success the more strictly you can define your requirements. And if the list is created from user input, the user might not agree with the results of your processing; consider "octopi", "indices", et cetera.
Upvotes: -1
Reputation: 226674
Pluralization rules have many corner cases. Perhaps you can bypass a rules based approach and use a dictionary lookup to identify the plural form and singular form of a word.
Upvotes: 1