Reputation: 21
I am trying to use spaCy in Python to detect the word "grief" no matter the form, whether it is "I am grieving", "going through grief.""I grieved over __", if it's in all caps, etc. I am pretty new to python so I don't know lemmatization that well, but is there some simple if statements that could solve it using spaCy?
grief = str(input(("What is currently on your mind? ")))
doc = nlp(grief)
if [t.grief for t in doc if t.lemma_ == "grie"]:
grief1(sad_value)
Upvotes: 2
Views: 79
Reputation: 3680
To make use of the spaCy
lemmatiser, you need to check for two lemmas: "grief" and "grieve". However, this doesn't catch all cases as one might initially expect (see below).
In general, one should not always assume that the spaCy
lemma output will be lowercase, nor assume that random capitalisation of letters in a given input word does not influence the result. For example,
This Medium article by Jade Moillic highlights the lowercase limitations of the spaCy
lemmatiser quite well.
To handle these situations, one can force the lemma output to be lowercase, and then also add "grieving" as a possible lemma to check. Alternatively, Stemming via the SnowballStemmer
implementation provides a robust option. Solutions are as follows.
import spacy
nlp = spacy.load('en_core_web_sm', exclude=["ner"])
grief = str(input(("What is currently on your mind? ")))
# Input: "I am grieving"
doc = nlp(grief)
for t in doc:
lem = t.lemma_.lower()
if lem == "grief" or lem == "grieve" or lem == "grieving":
print("Found {}".format(lem))
# Output: "Found grieve"
import spacy
nlp = spacy.load('en_core_web_sm', exclude=["ner"])
texts = ["Grief is what I feel", "Grieving is not something I'm used to", "I am grieving", "Going through grief", "I will grieve", "I grieved", "He grieves", "I am gRieviNG"]
docs = list(nlp.pipe(texts))
for doc in docs:
print(doc.text)
for t in doc:
lem = t.lemma_.lower()
if lem == "grief" or lem == "grieve" or lem == "grieving":
print("\t-> Found {}".format(lem))
# Output
# Grief is what I feel
# -> Found grief
# Grieving is not something I'm used to
# -> Found grieve
# I am grieving
# -> Found grieve
# Going through grief
# -> Found grief
# I will grieve
# -> Found grieve
# I grieved
# -> Found grieve
# He grieves
# -> Found grieve
# I am gRieviNG
# -> Found grieving
from nltk.stem.snowball import SnowballStemmer
from nltk.tokenize import word_tokenize
stemmer = SnowballStemmer(language='english')
grief = str(input(("What is currently on your mind? ")))
for token in word_tokenize(grief):
stem = stemmer.stem(token)
if stem == 'grief' or stem == 'griev':
print("Found {}".format(stem))
Upvotes: 2