Reputation: 615
I have a about 90 documents that i have processed with spacy.
import spacy, os
nlp = spacy.load('de')
index = 1
for document in doc_collection:
doc = nlp(document)
doc.to_disk('doc_folder/' + str(index))
It seems to be working fine. After that i want to reload the doc files later as a generator object.
def get_spacy_doc_list():
for file in os.listdir(directory):
filename = os.fsdecode(file)
yield spacy.tokens.Doc(spacy.vocab.Vocab()).from_disk('doc_folder/' + filename)
for doc in get_spacy_doc_list():
for token in doc:
print(token.lemma_)
If I try this, then i get the following error:
KeyError: "[E018] Can't retrieve string for hash '12397158900972795331'."
How i can store and load the doc objects of spacy without getting this error? Thanks for your help!
Upvotes: 1
Views: 1730
Reputation: 615
Found the solution:
yield spacy.tokens.Doc(spacy.vocab.Vocab()).from_disk('doc_folder/' + filename)
The Vocab()-instance should be the specific one of your nlp.
yield spacy.tokens.Doc(nlp.vocab).from_disk('doc_folder/' + filename)
Upvotes: 4