Reputation: 1987
I'm referencing the code here https://github.com/martin-gorner/tensorflow-rnn-shakespeare/blob/master/rnn_train.py and am trying to convert the cell from GRUCell to LSTMCell. Here is an excerpt from the code.
# input state
Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE * NLAYERS], name='Hin') # [ BATCHSIZE, INTERNALSIZE * NLAYERS]
# using a NLAYERS=3 layers of GRU cells, unrolled SEQLEN=30 times
# dynamic_rnn infers SEQLEN from the size of the inputs Xo
# How to properly apply dropout in RNNs: see README.md
cells = [rnn.GRUCell(INTERNALSIZE) for _ in range(NLAYERS)]
# "naive dropout" implementation
dropcells = [rnn.DropoutWrapper(cell, input_keep_prob=pkeep) for cell in cells]
multicell = rnn.MultiRNNCell(dropcells, state_is_tuple=False)
multicell = rnn.DropoutWrapper(multicell, output_keep_prob=pkeep) # dropout for the softmax layer
Yr, H = tf.nn.dynamic_rnn(multicell, Xo, dtype=tf.float32, initial_state=Hin)
# Yr: [ BATCHSIZE, SEQLEN, INTERNALSIZE ]
# H: [ BATCHSIZE, INTERNALSIZE*NLAYERS ] # this is the last state in the sequence
H = tf.identity(H, name='H') # just to give it a name
I understand that LSTMCell has two states, the cell state C and the output state H. What I want to do is to feed the initial_state with a tuple of both states. How can I do so in the proper way? I have tried various methods but always meet with a tensorflow error.
EDIT: This is one of the attempts:
# inputs
X = tf.placeholder(tf.uint8, [None, None], name='X') # [ BATCHSIZE, SEQLEN ]
Xo = tf.one_hot(X, ALPHASIZE, 1.0, 0.0) # [ BATCHSIZE, SEQLEN, ALPHASIZE ]
# expected outputs = same sequence shifted by 1 since we are trying to predict the next character
Y_ = tf.placeholder(tf.uint8, [None, None], name='Y_') # [ BATCHSIZE, SEQLEN ]
Yo_ = tf.one_hot(Y_, ALPHASIZE, 1.0, 0.0) # [ BATCHSIZE, SEQLEN, ALPHASIZE ]
# input state
Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE * NLAYERS], name='Hin') # [ BATCHSIZE, INTERNALSIZE * NLAYERS]
Cin = tf.placeholder(tf.float32, [None, INTERNALSIZE * NLAYERS], name='Cin')
initial_state = tf.nn.rnn_cell.LSTMStateTuple(Cin, Hin)
# using a NLAYERS=3 layers of GRU cells, unrolled SEQLEN=30 times
# dynamic_rnn infers SEQLEN from the size of the inputs Xo
# How to properly apply dropout in RNNs: see README.md
cells = [rnn.LSTMCell(INTERNALSIZE) for _ in range(NLAYERS)]
# "naive dropout" implementation
dropcells = [rnn.DropoutWrapper(cell, input_keep_prob=pkeep) for cell in cells]
multicell = rnn.MultiRNNCell(dropcells, state_is_tuple=True)
multicell = rnn.DropoutWrapper(multicell, output_keep_prob=pkeep) # dropout for the softmax layer
Yr, H = tf.nn.dynamic_rnn(multicell, Xo, dtype=tf.float32, initial_state=initial_state)
It says "TypeError: 'Tensor' object is not iterable."
Thanks.
Upvotes: 0
Views: 1578
Reputation: 8078
The error is happening because w you have to provide a tuple (of placeholders) for each one of the layers separately when building the graph, then when you're training you must provide the state for the first layer.
The errors is saying: I need to iterate over the a list of a tuple of (c's and m's) because you have multiple cells and I need to initialize all of their states but all I see is a Tensor and I can't iterate over that.
This snippet shows how to setup the placeholders when building the graph:
state_size = 10
num_layers = 3
X = tf.placeholder(tf.float32, [None, 100, 10])
# the second dimension is size 2 and represents
# c, m ( the cell and hidden state )
# set the batch_size to None
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2,
None, state_size])
# l is number of layers placeholders
l = tf.unstack(state_placeholder, axis=0)
then we create a tuple of LSTMStateTuple for each layer
rnn_tuple_state = tuple(
[rnn.LSTMStateTuple(l[idx][0],l[idx][1])
for idx in range(num_layers)]
)
# I had to set resuse = True here : tf.__version__ 1.7.0
cells = [rnn.LSTMCell(10, reuse=True)] * num_layers
mc = rnn.MultiRNNCell(cells, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell=mc,
inputs=X,
initial_state=rnn_tuple_state,
dtype=tf.float32)
Here is the relevant bit from the docs:
initial_state: (optional) An initial state for the RNN. If cell.state_size is an integer, this must be a Tensor of appropriate type and shape [batch_size, cell.state_size].
So we ended creating a tuple of placeholders for each cell (layer) with the requisite size. (batch_size, state_size) where batch_size = None. I expounded on this answer
Upvotes: 2