Reputation: 55
I'm trying to implement some custom GRU cells using Tensorflow. I need to stack those cells, and I wanted to inherit from tensorflow.keras.layers.GRU
. However, when looking at the source code, I noticed that you can only pass a units
argument to the __init__
of GRU
, while RNN
has an argument that is a list of RNNcell
, and leverages it to stack those cells calling StackedRNNCells
. Meanwhile, GRU
only create one GRUCell
.
For the paper I'm trying to implement, I actually need to stack GRUCell
. Why are the implementation of RNN
and GRU
different?
Upvotes: 1
Views: 631
Reputation: 1
train_graph = tf.Graph() with train_graph.as_default():
# Initialize input placeholders
input_text = tf.placeholder(tf.int32, [None, None], name='input')
targets = tf.placeholder(tf.int32, [None, None], name='targets')
lr = tf.placeholder(tf.float32, name='learning_rate')
# Calculate text attributes
vocab_size = len(int_to_vocab)
input_text_shape = tf.shape(input_text)
# Build the RNN cell
lstm = tf.contrib.rnn.BasicLSTMCell(num_units=rnn_size)
drop_cell = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
cell = tf.contrib.rnn.MultiRNNCell([drop_cell] * num_layers)
# Set the initial state
initial_state = cell.zero_state(input_text_shape[0], tf.float32)
initial_state = tf.identity(initial_state, name='initial_state')
# Create word embedding as input to RNN
embed = tf.contrib.layers.embed_sequence(input_text, vocab_size, embed_dim)
# Build RNN
outputs, final_state = tf.nn.dynamic_rnn(cell, embed, dtype=tf.float32)
final_state = tf.identity(final_state, name='final_state')
# Take RNN output and make logits
logits = tf.contrib.layers.fully_connected(outputs, vocab_size, activation_fn=None)
# Calculate the probability of generating each word
probs = tf.nn.softmax(logits, name='probs')
# Define loss function
cost = tf.contrib.seq2seq.sequence_loss(
logits,
targets,
tf.ones([input_text_shape[0], input_text_shape[1]])
)
# Learning rate optimizer optimizer = tf.train.AdamOptimizer(learning_rate)
# Gradient clipping to avoid exploding gradients
gradients = optimizer.compute_gradients(cost)
capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
train_op = optimizer.apply_gradients(capped_gradients)
Upvotes: 0
Reputation: 12948
While searching for the documentation for these classes to add links, I noticed something that may be tripping you up: there are (currently, just before the official TF 2.0 release) two GRUCell
implementations in TensorFlow! There is a tf.nn.rnn_cell.GRUCell
and a tf.keras.layers.GRUCell
. It looks like the one from tf.nn.rnn_cell
is deprecated, and the Keras one is the one you should use.
From what I can tell, the GRUCell
has the same __call__()
method signature as tf.keras.layers.LSTMCell
and tf.keras.layers.SimpleRNNCell
, and they all inherit from Layer
. The RNN
documentation gives some requirements on what the __call__()
method of the objects you pass to its cell
argument must do, but my guess is that all three of these should meet those requirements. You should be able to just use the same RNN
framework and pass it a list of GRUCell
objects instead of LSTMCell
or SimpleRNNCell
.
I can't test this right now, so I'm not sure if you pass a list of GRUCell
objects or just GRU
objects into RNN
, but I think one of those should work.
Upvotes: 1