Reputation: 141
How could I use NLTK module to write both the noun's singular and plural form, or tell it not to differentiate between singular and plural when searching a txt file for a word? Can I use NLTK to make the program case insensitive?
Upvotes: 14
Views: 20105
Reputation: 789
I have tried following code tweaking inflect library, hope it will help you.
import inflect
inflectEngine = inflect.engine()
def getSingular(word):
return word if not inflectEngine.singular_noun(word) else inflectEngine.singular_noun(word)
def getPlural(word):
word = getSingular(word)
return word if not inflectEngine.plural_noun(word) else inflectEngine.plural_noun(word)
########################################################
##################### Testing Area #####################
######################## Output ########################
# UNCONDITIONALLY FORM THE PLURAL
singularWord1 = "dog"
singularWord2 = "horse"
singularWord3 = "terretory"
pluralWord1 = "books"
pluralWord2 = "hotels"
pluralWord3 = "categories"
print("The plural/singular conversion : ", singularWord1, " => ", inflectEngine.plural(singularWord1))
print("The plural/singular conversion : ", singularWord2, " => ", inflectEngine.plural(singularWord2))
print("The plural/singular conversion : ", singularWord3, " => ", inflectEngine.plural(singularWord3))
print("The plural/singular conversion : ", pluralWord1, " => ", inflectEngine.plural(pluralWord1))
print("The plural/singular conversion : ", pluralWord2, " => ", inflectEngine.plural(pluralWord2))
print("The plural/singular conversion : ", pluralWord3, " => ", inflectEngine.plural(pluralWord3))
print("")
######################## Output ########################
# The plural/singular conversion : dog => dogs
# The plural/singular conversion : horse => horses
# The plural/singular conversion : terretory => terretories
# The plural/singular conversion : books => book
# The plural/singular conversion : hotels => hotel
# The plural/singular conversion : categories => category
#########################################################
print("The singular conversion")
print(getSingular(singularWord1))
print(getSingular(singularWord2))
print(getSingular(singularWord3))
print(getSingular(pluralWord1))
print(getSingular(pluralWord2))
print(getSingular(pluralWord3))
print("")
######################## Output ########################
# The singular conversion
# dog
# horse
# terretory
# book
# hotel
# category
#########################################################
print("The plural conversion")
print(getPlural(singularWord1))
print(getPlural(singularWord2))
print(getPlural(singularWord3))
print(getPlural(pluralWord1))
print(getPlural(pluralWord2))
print(getPlural(pluralWord3))
print("")
######################## Output ########################
# The plural conversion
# dogs
# horses
# terretories
# books
# hotels
# categories
#########################################################
# CONDITIONALLY FORM THE PLURAL
count = 1
print("I saw", count, inflectEngine.plural(singularWord1, count))
count = 5
print("I saw", count, inflectEngine.plural(singularWord1, count))
######################## Output ########################
# I saw 1 dog
# I saw 5 dogs
#########################################################
Upvotes: 0
Reputation: 479
It might be a bit late to answer but just in case someone is still looking for something similar:
There's inflect (also available in github) which support python 2.x and 3.x. You can find the singular or plural form of a given word:
import inflect
p = inflect.engine()
words = "cat dog child goose pants"
print([p.plural(word) for word in words.split(' ')])
# ['cats', 'dogs', 'children', 'geese', 'pant']
Is worth noticing that p.plural
of a plural will give you the singular
form.
In addition, you can provide a POS (Part Of Speech) tag or to provide a number and the lib determines if it needs to be plural or singular:
p.plural('cat', 4) # cats
p.plural('cat', 1) # cat
# but also...
p.plural('cat', 0) # cats
Upvotes: 3
Reputation: 1528
Pattern currently writing does not support Python 3 (although there is ongoing discussion about this here https://github.com/clips/pattern/issues/62.
TextBlob https://textblob.readthedocs.io is built on top of pattern and NLTK, and also includes the pluralization functionality. It seems to do a pretty good job of this, though it is not perfect. See the example code below.
from textblob import TextBlob
words = "cat dog child goose pants"
blob = TextBlob(words)
plurals = [word.pluralize() for word in blob.words]
print(plurals)
# >>> ['cats', 'dogs', 'children', 'geese', 'pantss']
Upvotes: 6
Reputation: 8071
Here's one possible way to do it with NLTK. Imagine you're searching for the word 'feature':
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
wnl = WordNetLemmatizer()
text = "This is a small text, a very small text with no interesting features."
tokens = [token.lower() for token in word_tokenize(text)]
lemmatized_words = [wnl.lemmatize(token) for token in tokens]
'feature' in lemmatized_words
Case sensitivity was dealt with using str.lower()
in all words, and of course you also have to lemmatize the search word if necessary.
Upvotes: 6