Reputation: 1
I am using FastText.load_fasttext_format()
to load fastText Official Japanese trained model (300 dim) in Google Colab.
Here is my code.
model_path = "/content/drive/MyDrive/IDR/rakuten/wikipedia_fastText/cc.ja.300.bin"
model = FastText.load_fasttext_format(model_path)
And here is the encoding error.
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-7-61d7c85f09b2> in <module>()
2
3 model_path = "/content/drive/MyDrive/IDR/rakuten/wikipedia_fastText/cc.ja.300.bin"
----> 4 model = FastText.load_fasttext_format(model_path)
2 frames
/usr/local/lib/python3.7/dist-packages/gensim/models/fasttext.py in _load_dict(self, file_handle, encoding)
818 word_bytes += char_byte
819 char_byte = file_handle.read(1)
--> 820 word = word_bytes.decode(encoding)
821 count, _ = self.struct_unpack(file_handle, '@qb')
822
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 0: unexpected end of data
Upvotes: 0
Views: 567
Reputation: 54233
The specific error seems to be unexpected end of data
.
Are you sure the cc.ja.300.bin
file you've downloaded is the full untruncated length, and uncorrupted contents to match any declared checksum, from the source where it was downloaded?
Separately, the load_fasttext_format()
class method is deprecated in current versions of Gensim, with load_facebook_model()
now the preferred form (though this wouldn't account for your error).
Upvotes: 1