tamuhey
tamuhey

Reputation: 3535

How to store custom class object into spaCy.doc and use `doc.to_disk`?

I want to store my class object into spacy.Doc and save it with doc.to_disk, as follows:

from spacy.tokens import Doc
from spacy.vocab import Vocab
from dataclasses import dataclass


@dataclass
class Foo:
    a: int


doc = Doc(Vocab(), [])
doc.user_data["foo"] = Foo(1)
doc.to_disk("/tmp/fooo")

But this code raise Errors:

TypeError: can not serialize 'Foo' object

What should I do?

Upvotes: 1

Views: 348

Answers (1)

APhillips
APhillips

Reputation: 1181

Per this thread here, you should try the following work around:

    def remove_unserializable_results(doc):
        doc.user_data = {}
        for x in dir(doc._):
            if x in ['get', 'set', 'has']: continue
            setattr(doc._, x, None)
        for token in doc:
            for x in dir(token._):
                if x in ['get', 'set', 'has']: continue
                setattr(token._, x, None)
        return doc

nlp.add_pipe(remove_unserializable_results, last=True)

Upvotes: 1

Related Questions