Reputation: 21
I am trying to convert a KeyedVector word2vec object to a tsv file. Here is my code:
wv_embeddings = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True, limit=100000)
Would you loop through each of the embeddings and save them to a tsv file?
Upvotes: 1
Views: 1029
Reputation: 1827
The vocabulary is stored in wv_embeddings.wv.vocab.keys()
and wv_embeddings.wv.get_vector()
allows to get the vector corresponding to a word. The tsv can be written with the csv standard module:
import csv
with open('wv_embeddings.tsv', 'w') as tsvfile:
writer = csv.writer(tsvfile, delimiter='\t')
words = wv_embeddings.wv.vocab.keys()
for word in words:
vector = wv_embeddings.wv.get_vector(word).tolist()
row = [word] + vector
writer.writerow(row)
Upvotes: 1