Flora
Flora

Reputation: 31

How can a Word2Vec pretrained model be loaded in Gensim faster?

I'm loading the model using:

model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True) 

Now every time i run the file in Pycharm, it loads the model again.

So, is there a way to load it once and be available whenever i run things like model['king'] and model.doesnt_match("house garage store dog".split())

because it takes alot of time whenever i wana check the similarity or words that don't match. When i ran model.most_similar('finance') it was really slow and the whole laptop freezed for like 2 min. So, is there a way to make things faster, 'cause i wana use it in my project, but i can't let the user wait for this long.

Any suggestions?

Upvotes: 3

Views: 2196

Answers (1)

gojomo
gojomo

Reputation: 54153

That's a set of word-vectors that's about 3.6GB on disk, and slightly larger when loaded - so just the disk IO can take a noticeable amount of time.

Also, at least until gensim-4.0.0 (now available as a beta preview), versions of Gensim through 3.8.3 require an extra one-time pre-calculation of unit-length-normalized vectors upon the very first use of a .most_similar() or .doesnt_match() operation (& others). This step can also take a noticeable moment, & then immediately requires a few extra GB of memory for a full model like GoogleNews - which on any machine with less thanf about 8GB RAM free risks using slower virtual-memory or even crashing with an out-of-memory error. (Starting in gensim-4.0.0beta, once the model loads, the 1st .most_similar() won't need any extra pre-calculation/allocation.)

The main way to avoid this annoying lag is to structure your code or service to not reload it separately before each calculation. Typically, this means keeping an interactive Python process that's loaded it alive, ready for your extra operations (or later user requests, as might be the case with a web-deployed service.)

It sounds like you may be developing a single Python script, something like mystuff.py, and running it via PyCharm's execute/debug/etc utilities for launching a Python file. Unfortunately, upon each completed execution, that will let the whole Python process end, releasing any loaded data/objects completely. Running the script again must do all the loading/precalculation again.

If your main interest is doing a bit of investigational examination & experimentation with the set of word-vectors, on your own, a big improvement would be to move to an interactive environment that keeps a single Python run alive & waiting for your next line of code.

For example, if you run the ipython interpreter at a command-line, in a separate shell, you can load the model, do a few lookup/similarity operations to print the results, and then just leave the prompt waiting for your next code. The full loaded state of the process remains available until you choose to exit the interpreter.

Similarly, if you use a Jupyter Notebook inside a web-browser, you get that same interpreter experience inside a growing set of editable-code-and-result 'cells' that you can re-run. All are sharing the same back-end interpreter process, with persistent state – unless you choose to restart the 'kernel'.

If you're providing a script or library code for your users' investigational work, they could also use such persistent interpreters.

But if you're building a web service or other persistently-running tool, you'd similarly want to make sure that the model remains loaded between user requests. (Exactly how you'd do that would depend on the details of your deployment, including web server software, so it'd be best to ask/search-for that as a separate question supplying more details when you're at that step.)

There is one other trick that may help in your constant-relaunch scenario. Gensim can save & load in its own native format, which can make use of 'memory-mapping'. Essentially, a range of a file on-disk can be used directly by the operating-system's virtual memory system. Then, when many processes all designate the same file as the canonical version of something they want in their own memory-space, the OS knows they can re-use any parts of that file that are already in memory.

This technique works far more simply in the `gensim-4.0.0beta' and later, so I'm only going to describe the steps needed there. (See this message if you want to force this preview installation before Gensim 4.0 is officially released.)

First, load the original-format file, but then re-save it in Gensim's format:

from gensim.models import KeyedVectors
kv_model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True) 
kv_model.save('GoogleNews-vectors-negative300.kv')

Note that there will be an extra .npv file created that must be kept alongside the GoogleNews-vectors-negative300.kv if you move the model elsewhere. DO this only once to create the new files.

Second, when you later need the model, use Gensim's .load() with the mmap option:

kv_model = KeyedVectors.load('GoogleNews-vectors-negative300.kv', mmap='r')
# do your other operations

Right away, the .load() should complete faster. However, when you 1st try to access any word – or all words in a .most_similar() – the read from disk will still need to happen, just shifting the delays to later. (If you're only ever doing individual-word lookups or small sets of .doesnt_match() words, you may not notice any long lags.)

Further, depending on your OS & amount-of-RAM, you might even get some speedup when you run your script once, let it finish, then run it again soon after. It's possible in some cases that even though the OS has ended the prior process, its virtual-memory machinery remembers that some of the not-yet-cleared old-process memory pages are still in RAM, & correspond to the memory-mapped file. Thus, the next memory-map will re-use them. (I'm not sure of this effect, and if you're in a low-memory situation the chance of such re-use from a completed may disappear completely.

But, you could increase the chances of the model file staying memory-resident by taking a third step: launch a separate Python process to preload the model that doesn't exit until killed. To do this, make another Python script like preload.py:

from gensim.models import KeyedVectors
from threading import Semaphore
model = KeyedVectors.load('GoogleNews-vectors-negative300.kv', mmap='r')
model.most_similar('stuff')  # any word will do: just to page all in
Semaphore(0).acquire()  # just hang until process killed

Run this script in a separate shell: python preload.py. It will map the model into memory, but then hang until you CTRL-C exit it.

Now, any other code you run on the same machine that memory-maps the same file will automatically re-use any already-loaded memory pages from this separate process. (In low-memory conditions, if any other virtual-memory is being relied upon, ranges could still be flushed out of RAM. But if you have plentiful RAM, this will ensure minimal disk IO each new time the same file is referenced.)

Finally, one other option that can be mixed with any of these is to load only a subset of the full 3-million-token, 3.6GB GoogleNews set. The less-common words are near the end of this file, and skipping them won't affect many uses. So you can use the limit argument of load_word2vec_format() to only load a subset - which loads faster, uses less memory, and completes later full-set searches (like .most_similar()) faster. For example, to load just the 1st 1,000,000 words for about 67% savings of RAM/load-time/search-time:

from gensim.models import KeyedVectors
kv_model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin',  limit=1000000, binary=True) 

Upvotes: 1

Related Questions