RNN and LSTM implementation in tensorflow

Question

I have been trying to learn how to code up an RNN and LSTM in tensorflow. I found an example online on this blog post

http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html

Below are the snippets which I am having trouble understanding for an LSTM network to be used eventually for char-rnn generation

    x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
    y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')

    embeddings = tf.get_variable('embedding_matrix', [num_classes, state_size])
    rnn_inputs = [tf.squeeze(i) for i in tf.split(1,
                            num_steps, tf.nn.embedding_lookup(embeddings, x))]

Different Section of the Code Now where the weights are defined

 with tf.variable_scope('softmax'):
         W = tf.get_variable('W', [state_size, num_classes])
         b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
 logits = [tf.matmul(rnn_output, W) + b for rnn_output in rnn_outputs]

 y_as_list = [tf.squeeze(i, squeeze_dims=[1]) for i in tf.split(1, num_steps, y)]

x is the data to be fed, and y is the set of labels. In the lstm equations we have a series of gates, x(t) gets multiplied by a series and prev_hidden_state gets multiplied by some set of weights, biases are added and non-liniearities are applied.

Here are the doubts I have

In this case only one weight matrix is defined does that mean that works for both x(t) and prev_hidden_state as well.
For the embeddings matrix I know it has to be multiplied by the weight matrix but why is the first dimension num_classes
For the rnn_inputs we are using squeeze which removes dimensions of 1 but why would I want to do that in a one-hot-encoding.
Also from the splits I understand that we are unrolling the x of dimension (batch_size X num_steps) into discrete (batch_size X 1) vectors and then passing these values through the network is this right

DAlolicorn · Accepted Answer

May I help you.

In this case only one weight matrix is defined does that mean that works for both x(t) and prev_hidden_state as well.

There are more weights as you call tf.nn.rnn_cell.LSTMCell. They are the internal weights of the RNN cell, which tensorflow created it implicitly when you call the cell.

The weight matrix you explicitly defined is the transform from the hidden state to the vocabulary space.

You can view the implicit weights accounting for the recurrent parts, taking the previous hidden state and current input and output the new hidden state. And the weight matrix you defined transform the hidden states(i.e. state_size = 200) to the higher vocabulary space.(i.e. vocab_size = 2000)

For further information, maybe you can view this tutorial : http://colah.github.io/posts/2015-08-Understanding-LSTMs/

For the embeddings matrix I know it has to be multiplied by the weight matrix but why is the first dimension num_classes

The num_classes accounts for the vocab_size, the embedding matrix is transforming the vocabulary to the required embedding size(in this example is equal to the state_size).

For the rnn_inputs we are using squeeze which removes dimensions of 1 but why would I want to do that in a one-hot-encoding.

You need to get rid of the extra dimension because tf.nn.rnn takes inputs as (batch_size, input_size) instead of (batch_size, 1, input_size).

Also from the splits I understand that we are unrolling the x of dimension (batch_size X num_steps) into discrete (batch_size X 1) vectors and then passing these values through the network is this right?

Being more precise, after embedding. (batch_size, num_steps, state_size) turns into a list of num_step elements, each of size (batch_size, 1, state_size).

The flow goes like this :

The embedding matrix embed each word as a state_size dimension vector(a row of the matrix), making the size (vocab_size, state_size).
Retrieve the the indices specified by the x placeholder and get the rnn input, which is size (batch_size, num_steps, state_size).
tf.split split the inputs to (batch_size, 1, state_size)
tf.squeeze sqeeze them to (batch_size, state_size), forming the desired input format for tf.nn.rnn.

If there's any problem with the tensorflow methods, maybe you can search them in the tensorflow API for more detailed introduction.