reese0106
reese0106

Reputation: 2061

Implement Embedding Dropout in Tensorflow

I am reading this paper on "Regularizing and Optimizing LSTM Language Models" and they talk about Embedding Dropout which says "As the dropout occurs on the embedding matrix that is used for a full forward and backward pass, this means that all occurrences of a specific word will disappear within that pass, equivalent to performing variational dropout on the connection between the one-hot embedding and the embedding lookup." However, I cannot seem to figure out a great approach to do this within a tensorflow experiment. For each new batch, I currently embed my sequence with the following code:

embedding_sequence = tf.contrib.layers.embed_sequence(features['input_sequence'], vocab_size=n_tokens, embed_dim=word_embedding_size)

Now I could easily apply dropout to the embedding_sequence, however my read of the paper says that the same words should be dropped from the entire forward/backward pass. Any suggestions on a simple approach that would still allow me to use embed_sequence? Here is what I think my approach should be after breaking down embed_sequence but I'm still not convinced it is correct...

PROPOSED SOLUTION

embedding_matrix = tf.get_variable("embeddings", shape=[vocab_size, embed_dim], dtype = tf.float32, initializer = None, trainable=True)
embedding_matrix_dropout = tf.nn.dropout(embedding_matrix, keep_prob=keep_prob)
embedding_sequence = tf.nn.embedding_lookup(embedding_matrix_dropout, features['input_sequence'])

Is there a more appropriate way to handle this? Is there anything I am getting from embed_sequence that I will not get from my proposed solution?

Secondary things I'm unsure about:

  1. what should my embedding_matrix initializer be? Default is set to None?
  2. tf.nn.dropout appears to handle scaling by 1/keep_prob as mentioned is necessary in the paper, correct?

Upvotes: 3

Views: 2134

Answers (2)

mona
mona

Reputation: 31

If you are using keras api you can use tf.keras.layers.Dropout(0.2,noise_shape=[batch_size1,4,1]) on top of the embeding layer.

play with it:

    embedding_dim1=3
    vocab_size=4
    batch_size1=1
    max_timestamp=4
    model1 = tf.keras.Sequential([
      tf.keras.layers.Embedding(vocab_size1, embedding_dim1,
                                batch_input_shape=[batch_size1, None]),
      tf.keras.layers.Dropout(0.2,noise_shape=[batch_size1,max_timestamp,1])
      #tf.keras.layers.Dropout(0.2) this is not what you want
      #tf.keras.layers.Dropout(0.2,noise_shape=[batch_size1,None,1]) not good. can't take dynamic shape
    ])

    model1(tf.constant([[1,2,3,0]]))

read about noise_shape arg in https://www.tensorflow.org/api_docs/python/tf/nn/dropout

Upvotes: 1

Vivek Verma
Vivek Verma

Reputation: 21

You can use embedding dropouts like this..

with tf.variable_scope('embedding'):
   self.embedding_matrix = tf.get_variable( "embedding", shape=[self.vocab_size, self.embd_size], dtype=tf.float32, initializer=self.initializer)

with tf.name_scope("embedding_dropout"):
   self.embedding_matrix = tf.nn.dropout(self.embedding_matrix, keep_prob=self.embedding_dropout, noise_shape=[self.vocab_size,1])

with tf.name_scope('input'):
   self.input_batch = tf.placeholder(tf.int64, shape=(None, None))
   self.inputs = tf.nn.embedding_lookup(self.embedding_matrix, self.input_batch)

This randomly sets the rows of embedding matrix to zero, as mentioned in the https://arxiv.org/pdf/1512.05287.pdf which is cited in the paper you mentioned.

Source:

https://github.com/tensorflow/tensorflow/issues/14746

Similar pytorch implementation:

https://github.com/salesforce/awd-lstm-lm/blob/master/embed_regularize.py

Upvotes: 2

Related Questions