Reputation: 117
I getting memory error, when I use GoogleNews-vectors-negative300.bin or try to train a model with Gensim with wikipedia dataset corpus.(1 GB). I have 4GB RAM in my system. Is there any way to bypass this.
Can we host it on cloud service like AWS to get better speed ?
Upvotes: 2
Views: 1960
Reputation: 658
It is really difficult to load the entire Google-News-Vector pre-trained model. I was able to load about 50,000 (i.e 1/60 th of total) on my 8 GB Ubuntu machine using Jupyter Notebook. Yes, as expected, the memory/resource usage touched the roof.
So, it is safe to use atleast 16 GB to load the entire model, otherwise use limit=30000
as a parameter, as suggested by @gojomo.
Upvotes: 0
Reputation: 22624
No way to get away with 4G. I could load and compute GoogleNews-vectors-negative300.bin on my 8G RAM Macbook Pro. However, when I loaded this gigantic pretrained vector on AWS, I had to upgrade it to 16G RAM because it was serving a webapp at the same time. So basically if you want to use it on webapp with the safety margin, you would need 16G.
Upvotes: 0
Reputation: 54153
4GB is very tight for that vector set; you should have 8GB or more to load the full set. Alternatively you could use the optional limit
argument to load_word2vec_format()
to just load some of the vectors. For example, limit=500000
would load just the first 500,000 (instead of the full 3 million). As the file appears to put the more-frequently-appearing tokens first, that may be sufficient for many purposes.
Upvotes: 6