Reputation: 310
I'm trying to print words from a DATASET with length 8483448 bytes on google colab but i'm geting this error :
words =list(model.wv.vocab)
print('this vocabulary for corpus')
print(words)
ERROR:
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
thanks for giving me some help to fix this error.
Upvotes: 0
Views: 4002
Reputation: 54233
Given the error, you seem to be hitting a Google Colab-specific limit on the output size.
Try printing len(model.wv.vocab)
first to get a sense of how large of an output you're trying to display. It may not be practical to show in a notebook cell!
If you just need a peek at some of the large vocabulary, print a small subet, for example print(words[0:10])
.
Note also: in the latest Gensim versions (>=4.0.0), the .vocab
dictionary goes away. But, a list of all known tokens (words), usually in descending-frequency order, is available in list model.wv.index_to_key
. (So, in gensim-4.0.0
& up, you could look at the 100 most-frequent tokens with print(model.wv.index_to_key[0:100])
.)
Upvotes: 2