Reputation: 594
I want to know if there is an elegant way to get the index of an Entity with respect to a Sentence. I know I can get the index of an Entity in a string using ent.start_char
and ent.end_char
, but that value is with respect to the entire string.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion. Apple just launched a new Credit Card.")
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
I want the Entity Apple
in both the sentences to point to start and end indexes 0 and 5 respectively. How can I do that?
Upvotes: 10
Views: 4168
Reputation: 626926
You need to subtract the sentence start position from the entity start positions:
for ent in doc.ents:
print(ent.text, ent.start_char-ent.sent.start_char, ent.end_char-ent.sent.start_char, ent.label_)
# ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
Output:
Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY
Apple 0 5 ORG
Credit Card 26 37 ORG
Upvotes: 17