Reputation: 1140
Is it possible to change one single entity in Spacy? I have some docs objects in a list, and some of the docs contains a "FRAUD" label. However, I need to change a few of the "FRAUD" entities labels to "FALSE_ALARM". I'm using Spacy's matcher to find the "FALSE_ALARM" entities, but I can't override the existing label. I have tried the following:
def add_event_ent(matcher, doc, i, matches):
match_id, start, end = matches[i]
match_doc = doc[start:end]
for entity in match_doc.ents:
# k.label = neg_hash <-- says " attribute 'label' of 'spacy.tokens.span.Span' objects is not writable"
span = Span(doc, entity.start, entity.end, label=false_alarm_hash)
doc.ents = list(doc.ents) + [span] # add span to doc.ents
ValueError: [E098] Trying to set conflicting doc.ents: '(14, 16,
'FRAUD')' and '(14, 16, 'FALSE_ALARM')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.
Upvotes: 3
Views: 4285
Reputation: 11484
The error message tells you what's going on: spacy doesn't allow overlapping entities and you're trying to add a new entity to a token without deleting the original entity first. You want something more like:
for entity in match_doc.ents:
span = Span(doc, entity.start, entity.end, label=false_alarm_hash)
doc.ents = [span if e == entity else e for e in doc.ents]
This is a one-line change to your current code to get it to work, but the list comprehension is really inefficient. Unless you have very few matches, you probably want to restructure how you process the matches to do this without iterating over the whole list of entities repeatedly. It might make more sense to process all the matches as a list (matches = matcher(doc)
) rather than using a callback function.
Upvotes: 5