jack
jack

Reputation: 21

Special Case Lemmatization ValueError while Using spacy for NLP

Special Case Lemmatization ValueError while Using spacy for NLP

Problem (What I think is happening)

While exploring special case lemmatization, I ran into a ValueError (provided below). I actually want to modify the text, changing "Friso" to "San Francisco." Does anyone see what I am doing wrong here?

Code

# (using spacy v3.4)
import spacy
from spacy.symbols import *  
nlp = spacy.load("en_core_web_sm")  
doc = nlp("I am flying to Frisco.")  
special_case = [ {NORM:"Frisco", ORTH:"San Francisco"} ]  
print([token.text for token in doc])
nlp.tokenizer.add_special_case("Frisco", special_case)  
print([token.lemma_ for token in nlp("I am flying to Frisco.")])

Error

ValueError: [E997] Tokenizer special cases are not allowed to modify the text. This would map 'Frisco' to 'San Francisco' given token attributes '[{67: 'Frisco', 65: 'San Francisco'}]'

Upvotes: 2

Views: 189

Answers (1)

polm23
polm23

Reputation: 15593

As the error says, tokenizer special cases cannot modify the text. spaCy does not allow modifying the text of a Doc object; this is a decision made to keep data consistent and avoid many classes of problems, even if it can be inconvenient.

Upvotes: 1

Related Questions