iCHAIT
iCHAIT

Reputation: 594

How to get Index of an Entity in a Sentence in Spacy?

I want to know if there is an elegant way to get the index of an Entity with respect to a Sentence. I know I can get the index of an Entity in a string using ent.start_char and ent.end_char, but that value is with respect to the entire string.

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion. Apple just launched a new Credit Card.")

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

I want the Entity Apple in both the sentences to point to start and end indexes 0 and 5 respectively. How can I do that?

Upvotes: 10

Views: 4168

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626926

You need to subtract the sentence start position from the entity start positions:

for ent in doc.ents:
    print(ent.text, ent.start_char-ent.sent.start_char, ent.end_char-ent.sent.start_char, ent.label_)
#                                 ^^^^^^^^^^^^^^^^^^^^              ^^^^^^^^^^^^^^^^^^^^

Output:

Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY
Apple 0 5 ORG
Credit Card 26 37 ORG

Upvotes: 17

Related Questions