spacy adding special case tokenization rules by regular expression or pattern

Question

I want to add special case for tokenization in spacy according to the documentation. The documentation shows how specific words can be considered as special cases. I want to be able to specify a pattern (e.g. a suffix). For example, I have a string like this

text = "A sample string with and "

where specifies a single word.

I know I can have it for one special case at a time by the following code. But how can I specify a pattern for that?

import spacy
from spacy.symbols import ORTH
nlp = spacy.load('en', vectors=False,parser=False, entity=False) 
nlp.tokenizer.add_special_case(u'', [{ORTH: u''}])

spacy adding special case tokenization rules by regular expression or pattern

Answers (1)

Related Questions