Reputation: 1
I'm working on an prediction project using lstm model in TensorFlow. The structure of the implementation worked, however, got a bad result which the accuracy of testing set was only 0.5. Thus, I have searched whether there exists some tricks of training a lstm-based model. Then I got "adding dropout".
However, following the tutorial by others, some errors occur.
Here's the original version and it worked :
def lstmModel(x, weights, biases):
x = tf.unstack(x, time_step, 1)
lstm_cell = tf.nn.rnn_cell.LSTMCell(n_hidden, state_is_tuple=True, forget_bias=1)
outputs, states = rnn.static_rnn (lstm_cell, x, dtype=tf.float32)rnn.static_rnn)
return tf.matmul(outputs[-1], weights['out']) + biases['out']
and after changing to below, it occurs an error :
ValueError: Shape (90, ?) must have rank at least 3
def lstmModel(x, weights, biases):
x = tf.unstack(x, time_step, 1)
lstm_cell = tf.nn.rnn_cell.LSTMCell(n_hidden, state_is_tuple=True, forget_bias=1)
lstm_dropout = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=0.5)
lstm_layers = rnn.MultiRNNCell([lstm_dropout]* 3)
outputs, states = tf.nn.dynamic_rnn(lstm_layers, x, dtype=tf.float32)
return tf.matmul(outputs[-1], weights['out']) + biases['out']
I'm confused if my shape of input data went wrong.
Before entering this function, the input x
is in the shape (batch_size, time_step, data_size)
batch_size = 30
time_step = 4 #read 4 words
data_size = 80 # total 80 words, each is in np.shape of [1,80]
So, the input shape x
each batch is [30,4,80]
.
And the input word x[0,0,80]
is followed by the word x[0,1,80]
.
Does the design make sense ?
The whole implementation is actually modified by other tutorial and I also wonder what did the tf.unstack()
actually do?
several problems above... I have putted the code in github with "worked version" and "failed version" mentioned above. Only the mentioned function differs! Please check, thanks!
Upvotes: 0
Views: 529
Reputation: 1206
Removing tf.unstack
from the second example should help.
tf.unstack
is used to break a tensor into a list of tensors. In your case, it will break x
which is of size (batch_size, time_step, data_size)
into a list of length time_step
containing tensors of size (batch_size, data_size)
.
This is needed for tf.nn.static_rnn
since it unfolds the rnn during graph creation so it needs a pre-specified number of step which is the length of the list coming from tf.unstack
.
tf.nn.dynamic_rnn
is unfolded in each run so that it can do a variable number of steps, therefore it takes one tensor where dimension 0 is the batch_size
, dimension 1 is the time_step
and dimension 2 is the data_size
(or the first two dimensions are reversed if time_major
is True
).
The error is due to tf.nn.dynamic_rnn
expecting a 3D tensor but each element in the supplied inputs list is 2D only due to tf.unstack
.
tl;dr Use tf.unstack
with tf.nn.static_rnn
but never use it with tf.nn.dynamic_rnn
.
Upvotes: 1