How to load word2vec model from zip file not having .bin file inside?

Question

Trying this project: webvectors This code works fine

nlpl_zip="C:/180.zip" 
with zipfile.ZipFile(nlpl_zip, "r") as archive:
    stream = archive.open("model.bin")
    model = gensim.models.KeyedVectors.load_word2vec_format(
        stream, binary=True,unicode_errors='replace'
    )

But when I tried to load model from http://vectors.nlpl.eu/repository/20/212.zip to folder C:/212.zip it doesn't work out, cause there is no model.bin inside. Only these ones:

enter image description here

But when I try

stream = archive.open("model.ckpt.data-00000-of-00001")

I've got the following. What am I doing wrong?

UnicodeDecodeError Traceback (most recent call last)
Cell In[11], line 9
7 with zipfile.ZipFile(model_file, 'r') as archive:
8 stream = archive.open('model.ckpt.data-00000-of-00001')
9 model = gensim.models.KeyedVectors.load_word2vec_format(stream, binary=True,unicode_errors='replace')

File C:\ProgramData\anaconda3\lib\sitepackages\gensim\models\keyedvectors.py:1719, in KeyedVectors.load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype, no_header)
1672 @classmethod
1673 def load_word2vec_format(
1674 cls, fname, fvocab=None, binary=False, encoding='utf8', unicode_errors='strict',
1675 limit=None, datatype=REAL, no_header=False,
1676 ):
1677 """Load KeyedVectors from a file produced by the original C word2vec-tool format.
1678
1679 Warnings
    (...)
1717
1718 """
1719 return _load_word2vec_format(
1720 cls, fname, fvocab=fvocab, binary=binary, encoding=encoding, unicode_errors=unicode_errors,
1721 limit=limit, datatype=datatype, no_header=no_header,
1722 )

File C:\ProgramData\anaconda3\lib\sitepackages\gensim\models\keyedvectors.py:2058, in _load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype, no_header, binary_chunk_size)
2056 fin = utils.open(fname, 'rb')
2057 else:
2058 header = utils.to_unicode(fin.readline(), encoding=encoding)
2059 vocab_size, vector_size = [int(x) for x in header.split()] # throws for invalid file format
2060 if limit:

File C:\ProgramData\anaconda3\lib\site-packages\gensim\utils.py:365, in any2unicode(text, encoding, errors)
363 if isinstance(text, str):
364 return text
365 return str(text, encoding, errors=errors)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 1: invalid continuation byte

tried many ways but failed

How to load word2vec model from zip file not having .bin file inside?

Answers (1)

Related Questions