Madhur Yadav
Madhur Yadav

Reputation: 723

how to extract specific words in negspacy

When I apply negspacy to my sentence, I want negspacy to consider specific phrase as a single entity and give me the output for it.

import en_core_sci_lg
from negspacy.negation import Negex
nlp = en_core_sci_lg.load()

negex = Negex(nlp, language = "en_clinical_sensitive")
nlp.add_pipe(negex, last=True)

doc = nlp(""" patient has no signs of shortness of breath. """)

for word in doc.ents:
    print(word, word._.negex)

The output is -

patient False
shortness True

I want the output to be -

patient False
shortness of breath True

How can I consider phrases like "shortness of breath", "sore throat", "respiratory distress" as a single entity.

I have tried -

import en_core_sci_lg
from negspacy.negation import Negex
nlp = en_core_sci_lg.load()
from spacy.pipeline import EntityRuler
ruler = EntityRuler(nlp)
patterns =  [{"label": "ENTITY", "pattern": [{"LOWER": "shortness"}, {"LOWER": "of"}, {"LOWER": "breath"}]}]

ruler.add_patterns(patterns)
nlp.add_pipe(ruler)
negex = Negex(nlp, language = "en_clinical")
nlp.add_pipe(negex, last=True)

doc = nlp("""patient has no signs of shortness of breath. """)

for word in doc.ents:
    print(word, word._.negex)

The output is still coming -

patient False
shortness True

What can I do to solve this problem

Upvotes: 2

Views: 1430

Answers (3)

Abe
Abe

Reputation: 11

You can just use:

nlp.add_pipe(ruler, before="ner")

This Should solve the issue that you have

Upvotes: 1

Madhur Yadav
Madhur Yadav

Reputation: 723

Rather than using scispacy model - en_core_sci_lg I used normal english model and got the desired results. https://spacy.io/usage/rule-based-matching#entityruler-usage

Upvotes: 0

amin_nejad
amin_nejad

Reputation: 1090

You may be glad to know the solution to your problem is very simple. You are just missing the keyword argument overwrite_ents=True in the EntityRuler constructor, so the custom pattern you are adding is being overwritten by other entities. Just change:

ruler = EntityRuler(nlp)

to

ruler = EntityRuler(nlp, overwrite_ents=True)

Now, my output is:

patient False
shortness of breath True

Upvotes: 1

Related Questions