tursunWali
tursunWali

Reputation: 71

how to extract a PERSON named entity after certain word with spacy?

I have this text ( text2 in code), it has 3 'by' word, I want to use Spacy to extract the person's name (full name, even if it is 3 words, some races use long names, in this case 2). The code is below, my pattern shows error. My intention: first fix the 'by' word with ORTH, then to tell program that whatever coming next is the Part of Speech entity called PERSON. I would be happy if anyone help it:

import spacy
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
text2 = 'All is done by Emily Muller, the leaf is burned by fire. we were not happy, so we cut     relations by saying bye bye'
def extract_person(nlp_doc):
     pattern = [{'ORTH': 'by'}, {'POS': 'NOUN'}}]
     # second possible pattern:
     #pattern = [{"TEXT": "by"}, {"NER": "PERSON"}]
     matcher.add('person_only', None, pattern)
     matches = matcher(nlp_doc)
     for match_id, start, end in matches:
         span = nlp_doc[start:end]
         return span.text
target_doc = nlp(text2)
extract_person(target_doc)

I think this question can be asked other way around: how to use NER tags in pattern in Matcher in spacy?

Upvotes: 1

Views: 1541

Answers (1)

krisograbek
krisograbek

Reputation: 1782

If you want to use whole names you should merge entities at the beginning. You can do it by calling: nlp.add_pipe("merge_entities", after="ner")

Then in your pattern instead of:

pattern = [{"TEXT": "by"}, {"NER": "PERSON"}]

Use:

pattern = [{"TEXT": "by"}, {"ENT_TYPE": "PERSON"}]

Complete code:

nlp.add_pipe("merge_entities", after="ner")

text2 = 'All is done by Emily Muller, the leaf is burned by fire. we were not happy, so we cut relations by saying bye bye'

doc = nlp(text2)

pattern = [{"TEXT": "by"}, {"ENT_TYPE": "PERSON"}]

matcher = Matcher(nlp.vocab)

matcher.add('person_only', [pattern])
matches = matcher(doc)
for match_id, start, end in matches:
    print(doc[start:end])

Upvotes: 2

Related Questions