I am trying to code a simple Neural machine translation using tensorflow. But I am a little stuck regarding the understanding of the embedding on tensorflow : I do not understand the difference between tf.contrib.layers.embed_sequence(inputs, vocab_size=target_vocab_size,embed_dim=decoding_embedding_size) and dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size])) dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input) In which case should I use one to another ? The second thing I do not understand is about tf.contrib.seq2seq.TrainingHelper and tf.contrib.seq2seq.GreedyEmbeddingHelper. I know that in the case of translation, we use mainly TrainingHelper for the training step (use the previous target to predict the next target) and GreedyEmbeddingHelper for the inference step (use the previous timestep to predict the next target). But I do not understand how does it work. In particular the different parameters used. For example why do we need a sequence length in the case of TrainingHelper (why do we not used an EOS)? Why both of them do not use the embedding_lookup or embedding_sequence as input ?

tensorflownlpword-embeddingmachine-translation

Reputation: 805

Tensorflow Embedding for training and inference

I am trying to code a simple Neural machine translation using tensorflow. But I am a little stuck regarding the understanding of the embedding on tensorflow :

I do not understand the difference between tf.contrib.layers.embed_sequence(inputs, vocab_size=target_vocab_size,embed_dim=decoding_embedding_size)

and

 dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
 dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)

In which case should I use one to another ?

The second thing I do not understand is about tf.contrib.seq2seq.TrainingHelper and tf.contrib.seq2seq.GreedyEmbeddingHelper. I know that in the case of translation, we use mainly TrainingHelper for the training step (use the previous target to predict the next target) and GreedyEmbeddingHelper for the inference step (use the previous timestep to predict the next target). But I do not understand how does it work. In particular the different parameters used. For example why do we need a sequence length in the case of TrainingHelper (why do we not used an EOS)? Why both of them do not use the embedding_lookup or embedding_sequence as input ?

Upvotes: 1

Answers (2)

Ghazi Felhi

Reputation: 61

I suppose that you're coming from this seq2seq tutorial. Even though this question is starting to get old, I'll try to answer for the people passing by like me:

For the first question, I looked at the source code behind tf.contrib.layers.embed_sequence, and it is actually using tf.nn.embedding_lookup. So it just wraps it, and creates the embedding matrix (tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))) for you. Although this is convenient and less verbose, by using embed_sequence there doesn't seem to a direct way to access the embeddings. So if you want to, you have to query for the internal variable used as the embedding matrix by using the same name space. I have to admit that the code in the tutorial above is confusing. I even suspect he's using different embeddings in the encoder and the decoder.
For the second question:

I guess it is equivalent to use a sequence length or an embedding.
The TrainingHelper doesn't need the embedding_lookup as it only forwards the inputs to the decoder, GreedyEmbeddingHelper does take as a first input the embedding_lookup as mentioned in the documentation.

Upvotes: 2

gnetmil

Reputation: 86

If I understand you correctly, the first question is about the differences between tf.contrib.layers.embed_sequence and tf.nn.embedding_lookup.

According to the official docs (https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence),

Typical use case would be reusing embeddings between an encoder and decoder.

I think tf.contrib.layers.embed_sequence is designed for seq2seq models.

I found the following post:

https://github.com/tensorflow/tensorflow/issues/17417

where @ispirmustafa mentioned:

embedding_lookup doesn't support invalid ids.

Also, in another post: tf.contrib.layers.embed_sequence() is for what?

@user1930402 said:

When building a neural network model that has multiple gates that take features as input, by using tensorflow.contrib.layers.embed_sequence, you can reduce the number of parameters in your network while preserving depth. For example, it eliminates the need for each gates of the LSTM to perform its own linear projection of features.

It allows for arbitrary input shapes, which helps the implementation be simple and flexible.

For the second question, sorry that I didn't use TrainingHelper and can't answer your question.

Upvotes: 1

Tensorflow Embedding for training and inference

Answers (2)

Related Questions