yac
yac

Reputation: 63

How to make spaCy case Insensitive

How can I make spaCy case insensitive when finding the entity name?

Is there any code snippet that i should add or something because the questions could mention entities that are not in uppercase?

def analyseQuestion(question):

    doc = nlp(question)
    entity=doc.ents 

    return entity

print(analyseQuestion("what is the best seller of Nicholas Sparks "))  
print(analyseQuestion("what is the best seller of nicholas sparks "))    

which gives

(Nicholas Sparks,)  
()

Upvotes: 6

Views: 9404

Answers (2)

happy_marmoset
happy_marmoset

Reputation: 2215

It is very easy. You just need to add a preprocessing step of question.lower() to your function:

def analyseQuestion(question):

    # Preprocess question to make further analysis case-insensetive
    question = question.lower()

    doc = nlp(question)
    entity=doc.ents 

    return entity

The solution inspired by this code from Rasa NLU library. However, for non-english (non-ASCII) text it might not work. For that case you can try:

question = question.decode('utf8').lower().encode('utf8')

However the NER module in spacy, to some extent depends on the case of the tokens and you might face some discrepancies as it is a statistical trained model.Refer this link.

Upvotes: -1

Mr. Robot Jr.
Mr. Robot Jr.

Reputation: 66

This is old, but this hopefully this will help anyone looking at similar problems.

You can use a truecaser to improve your results.

https://pypi.org/project/truecase/

Upvotes: 0

Related Questions