testTester
testTester

Reputation: 2649

Spacy: save parsed model

I am using Spacy, which is a Python Natural Language Processing library, to parse raw text into this more complex Object Oriented format, more specifically a dependency tree.

The above operation takes a while to execute: I need to load a very expensive model, and then parse a very large quantity of text. I would prefer to save some time in subsequent executions, as to iterate faster on handling the data after done this initial parsing.

How can I "save" these results after the first run, and then reload these preprocessed versions faster in subsequent runs?

PICKLE: When trying to use pickle I get the following error unserializing the Docs/Tokens classes:

File "spacy/tokens/token.pyx", line 56, in spacy.tokens.token.Token.__cinit__ (spacy/tokens/token.cpp:3868)
TypeError: __cinit__() takes exactly 3 positional arguments (0 given)

Thanks.

Upvotes: 4

Views: 1411

Answers (1)

Emiel
Emiel

Reputation: 388

No pickle solution, but I wrote this script in the past to store SpaCy output as XML (in the NAF format).

Depending on your pipeline, you could also try storing the output in CoNLL format (e.g. CoNLL-U). This makes your code interoperable with many other NLP tools, which is great because you could just change parsers with no issue.

I don't have example code for this, but the process should be similar.

Upvotes: 0

Related Questions