K.EL
K.EL

Reputation: 65

How to project my word2vec model in Tensorflow

I am new to using word embedding and want to know how i can project my model in Tensorflow. I was looking at the tensorflow website and it only accepts tsv file (vector/metadata), but don't know how to generate the required tsv files. I have tried looking it up and can't find any solutions regrading this. Will I try saving my model in a tsv file format, will i need to do some transformations? Any help will be appreciated.

I have saved my model as the following files, and just load it up when I need to use it:

word2vec.model

word2vec.model.wv.vectors.npy

Upvotes: 2

Views: 1199

Answers (1)

bivouac0
bivouac0

Reputation: 2560

Assuming you're trying to load some pre-trained Gensim word embeddings into a model, you can do this directly with the following code..

import numpy
import tensorflow as tf
from   gensim.models import KeyedVectors

# Load the word-vector model
wvec_fn = 'wvecs.kv'
wvecs = KeyedVectors.load(wvec_fn, mmap='r')
vec_size = wvecs.vector_size
vocab_size = len(wvecs.vocab)

# Create the embedding matrix where words are indexed alphabetically
embedding_mat = numpy.zeros(shape=(vocab_size, vec_size), dtype='int32')
for idx, word in enumerate(sorted(wvecs.vocab)):
    embedding_mat[idx] = wvecs.get_vector(word)

# Setup the embedding matrix for tensorflow
with tf.variable_scope("input_layer"):
    embedding_tf = tf.get_variable(
       "embedding", [vocab_size, vec_size],
        initializer=tf.constant_initializer(embedding_mat),
        trainable=False)

# Integrate this into your model
batch_size = 32     # just for example
seq_length = 20
input_data = tf.placeholder(tf.int32, [batch_size, seq_length])
inputs = tf.nn.embedding_lookup(embedding_tf, input_data)

If you've save a model instead of just the KeyedVectors, you may need to modify the code to load the model and then access the KeyedVectors with model.wv.

Upvotes: 3

Related Questions