angryweasel
angryweasel

Reputation: 366

Shared memory among processes for pre-trained word2vec model?

I have a look-up object, specifically a pre-trained word2vec model from gensim.models.keyedvectors.Word2VecKeyedVectors. I need to do some data pre-processing and I am using multi-processing for the same. Is there a way in which all of my processes can use the object from the same memory location instead of each process loading the object into its own memory?

Upvotes: 0

Views: 246

Answers (2)

gojomo
gojomo

Reputation: 54173

Yes, if:

  • the files were saved using Gensim's internal .save() method, and the relevant large-arrays of vectors are clearly separate .npy files
  • the files are loaded using Gensim's internal .load() method, with the mmap option
  • you avoid doing any operations which inadvertently cause each process's object to reallocate the backing array completely (breaking the mmap-sharing).

See this prior answer for an overview of the steps/concerns of a similar need.

(The concern & extra steps listed there to avoid breaking the mmap-sharing – by performing manual patch-ups of the norm properties – should no longer be necessary in Gensim 4.0.0, currently available only as a prerelease version.)

Upvotes: 1

David M.
David M.

Reputation: 4588

Yes, here are two options:

Upvotes: 1

Related Questions