Reputation: 64880
Given a word, which may or may not be a singular-form noun, how would you generate its plural form?
Based on this NLTK tutorial and this informal list on pluralization rules, I wrote this simple function:
def plural(word):
"""
Converts a word to its plural form.
"""
if word in c.PLURALE_TANTUMS:
# defective nouns, fish, deer, etc
return word
elif word in c.IRREGULAR_NOUNS:
# foot->feet, person->people, etc
return c.IRREGULAR_NOUNS[word]
elif word.endswith('fe'):
# wolf -> wolves
return word[:-2] + 'ves'
elif word.endswith('f'):
# knife -> knives
return word[:-1] + 'ves'
elif word.endswith('o'):
# potato -> potatoes
return word + 'es'
elif word.endswith('us'):
# cactus -> cacti
return word[:-2] + 'i'
elif word.endswith('on'):
# criterion -> criteria
return word[:-2] + 'a'
elif word.endswith('y'):
# community -> communities
return word[:-1] + 'ies'
elif word[-1] in 'sx' or word[-2:] in ['sh', 'ch']:
return word + 'es'
elif word.endswith('an'):
return word[:-2] + 'en'
else:
return word + 's'
But I think this is incomplete. Is there a better way to do this?
Upvotes: 14
Views: 29680
Reputation: 83427
One may use the pluralizer lib (MIT license). From the readme:
pip install pluralizer
from pluralizer import Pluralizer
pluralizer = Pluralizer()
assert pluralizer.pluralize('apple', 1, False) == 'apple'
assert pluralizer.pluralize('apple', 1, True) == '1 apple'
assert pluralizer.pluralize('apple', 2, False) == 'apples'
assert pluralizer.pluralize('apple', 2, True) == '2 apples'
assert pluralizer.plural('apple') == 'apples'
assert pluralizer.singular('apples') == 'apple'
assert pluralizer.isPlural('apples') == True
assert pluralizer.isPlural('apple') == False
assert pluralizer.isSingular('apples') == False
assert pluralizer.isSingular('apple') == True
Tested with Python 3.11 + Windows 10.
Upvotes: 0
Reputation: 29630
The pattern-en package offers pluralization
>>> import pattern.text.en
>>> pattern.text.en.pluralize("dog")
'dogs'
Note also that in order to run the import above successfully, you may have to first execute the following (at least the first time):
>>> import nltk
>>> nltk.download('omw-1.4')
Upvotes: 35
Reputation: 71
Most current pluralize libraries do not return multiple plurals for some irregular words. Some libraries do not enforce the passed parameter is noun and pluralize a word by general rules. So I decided to build a python library - Plurals and Countable, which is open source on github. The main purpose is to get plurals (yes, mutliple plurals for some words), and has an option to return dictionary approved plurals only. It can also return whether a noun is countable/uncountable or either way.
import plurals_counterable as pluc
pluc.pluc_lookup_plurals('octopus', strict_level='dictionary')
will return a dictionary of the following.
{
'query': 'octopus',
'base': 'octopus',
'plural': ['octopuses', 'octopi', 'octopodes'],
'countable': 'countable'
}
If you query by a noun's plural, the return also tells which word is its base (singular or plural-only word).
The library actually looks up the words in dictionaries, so it takes some time to request, parse and return. Alternatively, you might use REST API provided by Dictionary.video. You'll need contact [email protected] to get an API key. The call will be like
import requests
import json
import logging
url = 'https://dictionary.video/api/noun/plurals/octopus?key=YOUR_API_KEY'
response = requests.get(url)
if response.status_code == 200:
return json.loads(response.text)
else:
logging.error(url + ' response: status_code[%d]' % response.status_code)
return None
Upvotes: 0
Reputation: 5087
Another option which supports python 3 is Inflect.
import inflect
engine = inflect.engine()
plural = engine.plural(your_string)
Upvotes: 29
Reputation: 366083
First, it's worth noting that, as the FAQ explains, WordNet cannot generate plural forms.
If you want to use it anyway, you can. With Morphy, WordNet might be able to generate plurals for many nouns… but it still won't help with most irregular nouns, like "children".
Anyway, the easy way to use WordNet from Python is via NLTK. One of the NLTK HOWTO docs explains the WordNet Interface. (Of course it's even easier to just use NLTK without specifying a corpus, but that's not what you asked for.)
There is a lower-level API to WordNet called pywordnet
, but I believe it's no longer maintained (it became the foundation for the NLTK integration), and only works with older versions of Python (maybe 2.7, but not 3.x) and of WordNet (only 2.x).
Alternatively, you can always access the C API by using ctypes
or cffi
or building custom bindings, or access the Java API by using Jython instead of CPython.
Or, of course, you can call the command-line interface via subprocess
.
Anyway, at least on some installations, if you give the simple Morphy interface a singular noun, it will return its plural, while if you give it a plural noun, it will return its singular. So:
from nltk.corpus import wordnet as wn
assert wn.morphy('dogs') == 'dog'
assert wn.morphy('dog') == 'dog'
This isn't actually documented, or even implied, to be true, and in fact it's clearly not true for the OP, so I'm not sure I'd want to rely on it (even if it happens to work on your computer).
The other way around is documented to work, so you could write some rules that apply all possible English plural rules, call morphy
on each one, and the first one that returns the starting string is the right plural.
However, the way it's documented to work is effectively by blindly applying the same kind of rules. So, for example, it will properly tell you that doges
is not the plural of dog
—but not because it knows dogs
is the right answer; only because it knows doge
is a different word, and it likes the "+s" rule more than the "+es" rule. So, this isn't going to be helpful.
Also, as explained above, it has no rules for any irregular plurals—WordNet has no idea that children
and child
are related in any way.
Also, wn.morphy('reckless')
will return 'reckless'
rather than None
. If you want that, you'll have to test whether it's a noun first. You can do this just sticking with the same interface, although it's a bit hacky:
def plural(word):
result = wn.morphy(word)
noun = wn.morphy(word, wn.NOUN)
if noun in (word, result):
return result
To do this properly, you will actually need to add a plurals database instead of trying to trick WordNet into doing something it can't do.
Also, a word can have multiple meanings, and they can have different plurals, and sometimes there are even multiple plurals for the same meaning. So you probably want to start with something like (lemma for s in synsets(word, wn.NOUN) for lemma in s.lemmas if lemma.name == word)
and then get all appropriate plurals, instead of just returning "the" plural.
Upvotes: 5