Reputation: 2730
I used the code below to add custom Lookups
to a custom Lanuage
class:
def create_lookups():
lookups = Lookups()
lookups.add_table("lemma_lookup", LOOKUP)
lookups.add_table("lemma_rules", json_to_dict('lemma_rules.json'))
lookups.add_table("lemma_index", json_to_dict('lemma_index.json'))
lookups.add_table("lemma_exc", json_to_dict('lemma_exc.json'))
return lookups
def json_to_dict(filename):
location = os.path.realpath(
os.path.join(os.getcwd(), os.path.dirname(__file__)))
with open(os.path.join(location, filename)) as f_in:
return json.load(f_in)
@CustomeLanguage.factory(
"lemmatizer",
assigns=["token.lemma"],
default_config={"model": None, "mode": "lookup", "overwrite": False},
default_score_weights={"lemma_acc": 1.0},
)
def make_lemmatizer(
nlp: Language, model: Optional[Model], name: str, mode: str, overwrite: bool
):
lemmatizer = Lemmatizer(nlp.vocab, model, name, mode=mode, overwrite=overwrite)
lemmatizer.lookups = create_lookups()
return lemmatizer
But when I instantiate the CustomLanguage
there is no lookup table in nlp.vocab.lookups
. What is the problem and how can I solve it?
Upvotes: 0
Views: 404
Reputation: 11494
The lemmatizer lookups are no longer in the vocab. They're stored in the lemmatizer component under nlp.get_pipe("lemmatizer").lookups
instead.
If your lemmatizer factory creates the lemmatizer like this, anyone loading the model will need to have these JSON files available or the model won't load. (The lookup tables are saved in the model, but your make_lemmatizer
method just hasn't been written with this in mind.)
Instead, create a custom lemmatizer class that loads these tables in its initialize
method and then your code would look like this to add a lemmatizer and load its tables once.
nlp = spacy.blank("lg")
nlp.add_pipe("lemmatizer").initialize()
nlp.to_disk("/path/to/model")
Once you've run initialize()
once for the lemmatizer, the tables are saved with the model directory and you don't need to run it again when you reload the model.
It could look something like this, which would also allow you to pass in a Lookups
object to initialize
instead if you'd prefer:
class CustomLemmatizer(Lemmatizer):
def initialize(
self,
get_examples: Optional[Callable[[], Iterable[Example]]] = None,
*,
nlp: Optional[Language] = None,
lookups: Optional[Lookups] = None,
):
if lookups is None:
self.lookups = create_lookups()
else:
self.lookups = lookups
Upvotes: 3