Dan Harmon
Dan Harmon

Reputation: 323

How do I get the base of a synonym/plural of a word in python?

I would like to use python to convert all synonyms and plural forms of words to the base version of the word.

e.g. Babies would become baby and so would infant and infants.

I tried creating a naive version of plural to root code but it has the issue that it doesn't always function correctly and can't detect a large amount of cases.

contents = ["buying", "stalls", "responsibilities"]
for token in contents:
    if token.endswith("ies"):
        token = token.replace('ies','y')
    elif token.endswith('s'):
        token = token[:-1]
    elif token.endswith("ed"):
        token = token[:-2]
    elif token.endswith("ing"):
        token = token[:-3]

print(contents)

Upvotes: 0

Views: 486

Answers (2)

wholehope
wholehope

Reputation: 71

I build a python library - Plurals and Countable, which is open source on github. The main purpose is to get plurals (yes, mutliple plurals for some words), but it also solves this particular problem.

import plurals_counterable as pluc
pluc.pluc_lookup_plurals('men', strict_level='dictionary')

will return a dictionary of the following.

{
    'query': 'men', 
    'base': 'man', 
    'plural': ['men'], 
    'countable': 'countable'
}

The base field is what you need.

The library actually looks up the words in dictionaries, so it takes some time to request, parse and return. Alternatively, you might use REST API provided by Dictionary.video. You'll need contact [email protected] to get an API key. The call will be like

import requests
import json
import logging

url = 'https://dictionary.video/api/noun/plurals/men?key=YOUR_API_KEY'
response = requests.get(url)
if response.status_code == 200:
    return json.loads(response.text)['base']
else:
    logging.error(url + ' response: status_code[%d]' % response.status_code)
    return None

Upvotes: 0

Jason K Lai
Jason K Lai

Reputation: 1540

I have not used this library before, so that this with a grain of salt. However, NodeBox Linguistics seems to be a reasonable set of scripts that will do exactly what you are looking for if you are on MacOS. Check the link here: https://www.nodebox.net/code/index.php/Linguistics

Based on their documentation, it looks like you will be able to use lines like so:

print( en.noun.singular("people") )
>>> person

print( en.verb.infinitive("swimming") )
>>> swim

etc.

In addition to the example above, another to consider is a natural language processing library like NLTK. The reason why I recommend using an external library is because English has a lot of exceptions. As mentioned in my comment, consider words like: class, fling, red, geese, etc., which would trip up the rules that was mentioned in the original question.

Upvotes: 1

Related Questions