Inconsistency between GRU and RNN implementation

Question

I'm trying to implement some custom GRU cells using Tensorflow. I need to stack those cells, and I wanted to inherit from tensorflow.keras.layers.GRU. However, when looking at the source code, I noticed that you can only pass a units argument to the __init__ of GRU, while RNN has an argument that is a list of RNNcell, and leverages it to stack those cells calling StackedRNNCells. Meanwhile, GRU only create one GRUCell.

For the paper I'm trying to implement, I actually need to stack GRUCell. Why are the implementation of RNN and GRU different?

Engineero · Accepted Answer

While searching for the documentation for these classes to add links, I noticed something that may be tripping you up: there are (currently, just before the official TF 2.0 release) two GRUCell implementations in TensorFlow! There is a tf.nn.rnn_cell.GRUCell and a tf.keras.layers.GRUCell. It looks like the one from tf.nn.rnn_cell is deprecated, and the Keras one is the one you should use.

From what I can tell, the GRUCell has the same __call__() method signature as tf.keras.layers.LSTMCell and tf.keras.layers.SimpleRNNCell, and they all inherit from Layer. The RNN documentation gives some requirements on what the __call__() method of the objects you pass to its cell argument must do, but my guess is that all three of these should meet those requirements. You should be able to just use the same RNN framework and pass it a list of GRUCell objects instead of LSTMCell or SimpleRNNCell.

I can't test this right now, so I'm not sure if you pass a list of GRUCell objects or just GRU objects into RNN, but I think one of those should work.

Inconsistency between GRU and RNN implementation

Answers (2)

Related Questions