Reputation: 1148
I have problems with using w2v embeddings from google news.
I downloaded GoogleNews-vectors-negative300.bin.gz
and after running
gensim.models.KeyedVectors.load_word2vec_format('/home/slava/GoogleNews-vectors-negative300.bin.gz', binary=True)
I got error
IOerror: not a gzipped file
Okay, i runned gzip GoogleNews-vectors-negative300.bin
in console and
file GoogleNews-vectors-negative300.bin.gz
now says, that it is really gzip compressed data.
But running
gensim.models.KeyedVectors.load_word2vec_format('/home/slava/GoogleNews-vectors-negative300.bin.gz', binary=True)
now returns
ValueError: need more than 0 values to unpack
Full traceback:
> ValueError Traceback (most recent call
> last) <ipython-input-9-c4eebc3bcdb0> in <module>()
> 1
> 2 from gensim.models import Word2Vec
> ----> 3 model = gensim.models.KeyedVectors.load_word2vec_format('/home/slava/GoogleNews-vectors-negative300.bin.gz',
> binary=True)
>
> /home/slava/anaconda2/lib/python2.7/site-packages/gensim/models/keyedvectors.pyc
> in load_word2vec_format(cls, fname, fvocab, binary, encoding,
> unicode_errors, limit, datatype)
> 205 with utils.smart_open(fname) as fin:
> 206 header = utils.to_unicode(fin.readline(), encoding=encoding)
> --> 207 vocab_size, vector_size = map(int, header.split()) # throws for invalid file format
> 208 if limit:
> 209 vocab_size = min(vocab_size, limit)
>
> ValueError: need more than 0 values to unpack
How to fix this?
Upvotes: 1
Views: 842