Reputation: 63
How can I make spaCy case insensitive when finding the entity name?
Is there any code snippet that i should add or something because the questions could mention entities that are not in uppercase?
def analyseQuestion(question):
doc = nlp(question)
entity=doc.ents
return entity
print(analyseQuestion("what is the best seller of Nicholas Sparks "))
print(analyseQuestion("what is the best seller of nicholas sparks "))
which gives
(Nicholas Sparks,)
()
Upvotes: 6
Views: 9404
Reputation: 2215
It is very easy. You just need to add a preprocessing step of question.lower()
to your function:
def analyseQuestion(question):
# Preprocess question to make further analysis case-insensetive
question = question.lower()
doc = nlp(question)
entity=doc.ents
return entity
The solution inspired by this code from Rasa NLU library. However, for non-english (non-ASCII) text it might not work. For that case you can try:
question = question.decode('utf8').lower().encode('utf8')
However the NER module in spacy, to some extent depends on the case of the tokens and you might face some discrepancies as it is a statistical trained model.Refer this link.
Upvotes: -1
Reputation: 66
This is old, but this hopefully this will help anyone looking at similar problems.
You can use a truecaser to improve your results.
https://pypi.org/project/truecase/
Upvotes: 0