Reputation: 21
While exploring special case lemmatization, I ran into a ValueError (provided below). I actually want to modify the text, changing "Friso" to "San Francisco." Does anyone see what I am doing wrong here?
# (using spacy v3.4)
import spacy
from spacy.symbols import *
nlp = spacy.load("en_core_web_sm")
doc = nlp("I am flying to Frisco.")
special_case = [ {NORM:"Frisco", ORTH:"San Francisco"} ]
print([token.text for token in doc])
nlp.tokenizer.add_special_case("Frisco", special_case)
print([token.lemma_ for token in nlp("I am flying to Frisco.")])
ValueError: [E997] Tokenizer special cases are not allowed to modify the text. This would map 'Frisco' to 'San Francisco' given token attributes '[{67: 'Frisco', 65: 'San Francisco'}]'
Upvotes: 2
Views: 189
Reputation: 15593
As the error says, tokenizer special cases cannot modify the text. spaCy does not allow modifying the text of a Doc object; this is a decision made to keep data consistent and avoid many classes of problems, even if it can be inconvenient.
Upvotes: 1