SpaCy save model to disk with custom Sentencizer error

Question

I know similar questions were asked:

Custom sentence boundary detection in SpaCy

yet my situation is a little different. I want to inherit from the spacy Sentencizer() with:

from spacy.pipeline import Sentencizer

class MySentencizer(Sentencizer):
    def __init__(self):
        self.tok = create_mySentencizer() # returning the sentences

    def __call__(self, *args, **kwargs):
        doc = args[0]
        for tok in doc:
            # do set the boundaries with tok.is_sent_start 
        return doc

Even tho splitting works fine if I call doc = nlp("Text and so on. Another sentence.") after updating the model:

  nlp = spacy.load("some_model")
  sentencizer = MySentencizer()
  nlp.add_pipe(sentencizer, before="parser")
  # update model

when i want to save the trained model with:

nlp.to_disk("path/to/my/model")

I get the following error:

AttributeError: 'MySentencizer' object has no attribute 'punct_chars'

Contrary, if i use the nlp.add_pipe(nlp.create_pipe('sentencizer')) the error does not occur. I wonder at what point I should have set the punct_chars attribute. It should have been inherited from the superclass?

If i replace the Sentencizer from the class and do object according to the first post, it works, but I may lose some valuable information on the way e.g. punct_chars?

Thanks for help in advance.

Chris

Sergey Bushmanov · Accepted Answer

The following should do (note super(MySentencizer, self).__init__()):

import spacy
from spacy.pipeline import Sentencizer

class MySentencizer(Sentencizer):
    def __init__(self):
        super(MySentencizer, self).__init__() 

    def __call__(self, *args, **kwargs):
        doc = args[0]
        for tok in doc:
            tok.is_sent_start = True if tok.orth == "." else False
        return doc

nlp = spacy.load("en_core_web_md")
sentencizer = MySentencizer()
nlp.add_pipe(sentencizer, before="parser")

nlp.to_disk("model")

SpaCy save model to disk with custom Sentencizer error

Answers (1)

Related Questions