Reputation: 71
I have this text ( text2 in code), it has 3 'by' word, I want to use Spacy to extract the person's name (full name, even if it is 3 words, some races use long names, in this case 2). The code is below, my pattern shows error. My intention: first fix the 'by' word with ORTH, then to tell program that whatever coming next is the Part of Speech entity called PERSON. I would be happy if anyone help it:
import spacy
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
text2 = 'All is done by Emily Muller, the leaf is burned by fire. we were not happy, so we cut relations by saying bye bye'
def extract_person(nlp_doc):
pattern = [{'ORTH': 'by'}, {'POS': 'NOUN'}}]
# second possible pattern:
#pattern = [{"TEXT": "by"}, {"NER": "PERSON"}]
matcher.add('person_only', None, pattern)
matches = matcher(nlp_doc)
for match_id, start, end in matches:
span = nlp_doc[start:end]
return span.text
target_doc = nlp(text2)
extract_person(target_doc)
I think this question can be asked other way around: how to use NER tags in pattern in Matcher in spacy?
Upvotes: 1
Views: 1541
Reputation: 1782
If you want to use whole names you should merge entities at the beginning. You can do it by calling: nlp.add_pipe("merge_entities", after="ner")
Then in your pattern instead of:
pattern = [{"TEXT": "by"}, {"NER": "PERSON"}]
Use:
pattern = [{"TEXT": "by"}, {"ENT_TYPE": "PERSON"}]
Complete code:
nlp.add_pipe("merge_entities", after="ner")
text2 = 'All is done by Emily Muller, the leaf is burned by fire. we were not happy, so we cut relations by saying bye bye'
doc = nlp(text2)
pattern = [{"TEXT": "by"}, {"ENT_TYPE": "PERSON"}]
matcher = Matcher(nlp.vocab)
matcher.add('person_only', [pattern])
matches = matcher(doc)
for match_id, start, end in matches:
print(doc[start:end])
Upvotes: 2