Reputation: 99
I have been trying to learn how to code up an RNN and LSTM in tensorflow. I found an example online on this blog post
http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html
Below are the snippets which I am having trouble understanding for an LSTM network to be used eventually for char-rnn generation
x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')
embeddings = tf.get_variable('embedding_matrix', [num_classes, state_size])
rnn_inputs = [tf.squeeze(i) for i in tf.split(1,
num_steps, tf.nn.embedding_lookup(embeddings, x))]
with tf.variable_scope('softmax'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
logits = [tf.matmul(rnn_output, W) + b for rnn_output in rnn_outputs]
y_as_list = [tf.squeeze(i, squeeze_dims=[1]) for i in tf.split(1, num_steps, y)]
x is the data to be fed, and y is the set of labels. In the lstm equations we have a series of gates, x(t) gets multiplied by a series and prev_hidden_state gets multiplied by some set of weights, biases are added and non-liniearities are applied.
Upvotes: 2
Views: 1953
Reputation: 106
May I help you.
There are more weights as you call tf.nn.rnn_cell.LSTMCell
. They are the internal weights of the RNN cell, which tensorflow created it implicitly when you call the cell.
The weight matrix you explicitly defined is the transform from the hidden state to the vocabulary space.
You can view the implicit weights accounting for the recurrent parts, taking the previous hidden state and current input and output the new hidden state. And the weight matrix you defined transform the hidden states(i.e. state_size = 200
) to the higher vocabulary space.(i.e. vocab_size = 2000
)
For further information, maybe you can view this tutorial : http://colah.github.io/posts/2015-08-Understanding-LSTMs/
The num_classes accounts for the vocab_size
, the embedding matrix is transforming the vocabulary to the required embedding size(in this example is equal to the state_size
).
You need to get rid of the extra dimension because tf.nn.rnn
takes inputs as (batch_size, input_size)
instead of (batch_size, 1, input_size)
.
Being more precise, after embedding. (batch_size, num_steps, state_size)
turns into a list of num_step
elements, each of size (batch_size, 1, state_size)
.
state_size
dimension vector(a row of the matrix), making the size (vocab_size, state_size)
.(batch_size, num_steps, state_size)
.tf.split
split the inputs to (batch_size, 1, state_size)
tf.squeeze
sqeeze them to (batch_size, state_size)
, forming the desired input format for tf.nn.rnn
.If there's any problem with the tensorflow methods, maybe you can search them in the tensorflow API for more detailed introduction.
Upvotes: 1