csej
csej

Reputation: 55

Inconsistency between GRU and RNN implementation

I'm trying to implement some custom GRU cells using Tensorflow. I need to stack those cells, and I wanted to inherit from tensorflow.keras.layers.GRU. However, when looking at the source code, I noticed that you can only pass a units argument to the __init__ of GRU, while RNN has an argument that is a list of RNNcell, and leverages it to stack those cells calling StackedRNNCells. Meanwhile, GRU only create one GRUCell.

For the paper I'm trying to implement, I actually need to stack GRUCell. Why are the implementation of RNN and GRU different?

Upvotes: 1

Views: 631

Answers (2)

Yohan Potts
Yohan Potts

Reputation: 1

train_graph = tf.Graph() with train_graph.as_default():

# Initialize input placeholders
input_text = tf.placeholder(tf.int32, [None, None], name='input')
targets = tf.placeholder(tf.int32, [None, None], name='targets')
lr = tf.placeholder(tf.float32, name='learning_rate')

# Calculate text attributes
vocab_size = len(int_to_vocab)
input_text_shape = tf.shape(input_text)

# Build the RNN cell
lstm = tf.contrib.rnn.BasicLSTMCell(num_units=rnn_size)
drop_cell = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
cell = tf.contrib.rnn.MultiRNNCell([drop_cell] * num_layers)

# Set the initial state
initial_state = cell.zero_state(input_text_shape[0], tf.float32)
initial_state = tf.identity(initial_state, name='initial_state')

# Create word embedding as input to RNN
embed = tf.contrib.layers.embed_sequence(input_text, vocab_size, embed_dim)

# Build RNN
outputs, final_state = tf.nn.dynamic_rnn(cell, embed, dtype=tf.float32)
final_state = tf.identity(final_state, name='final_state')

# Take RNN output and make logits
logits = tf.contrib.layers.fully_connected(outputs, vocab_size, activation_fn=None)

# Calculate the probability of generating each word
probs = tf.nn.softmax(logits, name='probs')

# Define loss function
cost = tf.contrib.seq2seq.sequence_loss(
    logits,
    targets,
    tf.ones([input_text_shape[0], input_text_shape[1]])
)

# Learning rate optimizer optimizer = tf.train.AdamOptimizer(learning_rate)

# Gradient clipping to avoid exploding gradients
gradients = optimizer.compute_gradients(cost)
capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
train_op = optimizer.apply_gradients(capped_gradients)

Upvotes: 0

Engineero
Engineero

Reputation: 12948

While searching for the documentation for these classes to add links, I noticed something that may be tripping you up: there are (currently, just before the official TF 2.0 release) two GRUCell implementations in TensorFlow! There is a tf.nn.rnn_cell.GRUCell and a tf.keras.layers.GRUCell. It looks like the one from tf.nn.rnn_cell is deprecated, and the Keras one is the one you should use.

From what I can tell, the GRUCell has the same __call__() method signature as tf.keras.layers.LSTMCell and tf.keras.layers.SimpleRNNCell, and they all inherit from Layer. The RNN documentation gives some requirements on what the __call__() method of the objects you pass to its cell argument must do, but my guess is that all three of these should meet those requirements. You should be able to just use the same RNN framework and pass it a list of GRUCell objects instead of LSTMCell or SimpleRNNCell.

I can't test this right now, so I'm not sure if you pass a list of GRUCell objects or just GRU objects into RNN, but I think one of those should work.

Upvotes: 1

Related Questions