felipeformenti
felipeformenti

Reputation: 177

Loading a spacy .tar.gz model artifact from s3 Sagemaker

I have a pretrained spacy model on a local folder that I can easily read with m = spacy.load("path/model/")

But now I have to upload it as a .tar.gz file to use as a Sagemaker model artifact. How can I read this .tar.gz file?

Ideally I want to read the unzipped folder from memory. Without extracting all to disk and then reading it again

My question is almost a duplicate of this one Directly load spacy model from packaged tar.gz file. But the answers don't explain how to untar unzip the folder into memory

Upvotes: 1

Views: 670

Answers (2)

felipeformenti
felipeformenti

Reputation: 177

Turns out Sagemaker already decompress the .tar.gz file automatically. So I can just read the folder exactly like before.

Upvotes: 0

polm23
polm23

Reputation: 15593

Take a look at the serialization docs. You don't want to read the unzipped folder from memory (I'm not sure how that would work exactly), but you can use simple in-memory serialization, for example. In that case you save the config and the model separately.

To save:

config = nlp.config
bytes_data = nlp.to_bytes()

To read back:

lang_cls = spacy.util.get_lang_class(config["nlp"]["lang"])
nlp = lang_cls.from_config(config)
nlp.from_bytes(bytes_data)

Upvotes: 0

Related Questions