Reputation: 773
I am trying to load the google_news_vecotors.bin file but it gives an error. Below is my code it is written in the nlp_gen2.py file
import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('google_news_vectors.bin', binary=True)
the error I get is:
FileNotFoundError Traceback (most recent call last) in 1 import gensim
----> 2 model = gensim.models.KeyedVectors.load_word2vec_format('google_news_vectors.bin',
binary=True)
C:\Anaconda3\envs\DataScience\lib\site-packages\gensim\models\keyedvectors.py
in load_word2vec_format(cls, fname, fvocab, binary, encoding,
unicode_errors, limit, datatype) 1547 return _load_word2vec_format(
1548 cls, fname, fvocab=fvocab, binary=binary, encoding=encoding,
unicode_errors=unicode_errors, -> 1549 limit=limit, datatype=datatype)
1550 1551 @classmethod
C:\Anaconda3\envs\DataScience\lib\site-packages\gensim\models\utils_any2vec.py
in _load_word2vec_format(cls, fname, fvocab, binary, encoding,
unicode_errors, limit, datatype, binary_chunk_size) 273 274
logger.info("loading projection weights from %s", fname) --> 275 with
utils.open(fname, 'rb') as fin: 276 header =
utils.to_unicode(fin.readline(), encoding=encoding) 277 vocab_size,
vector_size = (int(x) for x in header.split()) # throws for invalid
file format
C:\Anaconda3\envs\DataScience\lib\site-packages\smart_open\smart_open_lib.py
in open(uri, mode, buffering, encoding, errors, newline, closefd,
opener, ignore_ext, transport_params) 185 encoding=encoding, 186
errors=errors, --> 187 newline=newline, 188 ) 189 if fobj is not None:
C:\Anaconda3\envs\DataScience\lib\site-packages\smart_open\smart_open_lib.py
in _shortcut_open(uri, mode, ignore_ext, buffering, encoding, errors,
newline) 285 open_kwargs['errors'] = errors 286 --> 287 return
_builtin_open(local_path, mode, buffering=buffering, **open_kwargs) 288 289
FileNotFoundError: [Errno 2] No such file or directory:
'google_news_vectors.bin'
my file structure is like below:
how can I solve this?
Upvotes: 0
Views: 2951
Reputation: 3517
Your question does not show clearly how your files are named, since your Explorer does not show file extensions. See this guide to turn them on.
For some reason you have a folder named GoogleNews-vectors-negative300.bin
. This should not be the case.
Download GoogleNews-vectors-negative300.bin.gz
. It should be exactly 1647046227 bytes, and its MD5 is 1c892c4707a8a1a508b01a01735c0339
.
Confirm the file size by inspecting the file properties.
Uncompress the file. It seems that you have WinRAR installed, and it should be able to perform the gunzip operation.
You should now have a file GoogleNews-vectors-negative300.bin
at 3644258522 bytes, and its MD5 is 023bfd73698638bdad5f84df53404c8b
.
Now, the following code should work:
import gensim
filename = 'GoogleNews-vectors-negative300.bin'
model = gensim.models.KeyedVectors.load_word2vec_format(filename, binary=True)
Download GoogleNews-vectors-negative300.bin.gz
. It should be exactly 1647046227 bytes, and its MD5 is 1c892c4707a8a1a508b01a01735c0339
.
Confirm the file size by inspecting the file properties.
Now, the following code should work:
import gensim
filename = 'GoogleNews-vectors-negative300.bin.gz'
model = gensim.models.KeyedVectors.load_word2vec_format(filename, binary=True)
This answer is based on this Colab.
Upvotes: 0
Reputation: 603
The file name is "GoogleNews-vectors-negative300.bin", but as you can see, the file is corrupted. Download and unpack the rar again.
Upvotes: 1