Reputation: 61
I have been trying to compute number of parameters in LSTM cell in Keras. I created two models one with LSTM and other with CuDNNLSTM.
Partial summary of models are as
CuDNNLSTM Model:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, None, 300) 192000
_________________________________________________________________
bidirectional (Bidirectional (None, None, 600) 1444800
LSTM model
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, None, 300) 192000
_________________________________________________________________
bidirectional (Bidirectional (None, None, 600) 1442400
Number of parameters in LSTM is following the formula for lstm parameter computation available all over the internet. However, CuDNNLSTM has 2400 extra parameters.
What is the cause of these extra parameters?
code
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
from tensorflow.compat.v1.keras.models import Sequential
from tensorflow.compat.v1.keras.layers import CuDNNLSTM, Bidirectional, Embedding, LSTM
model = Sequential()
model.add(Embedding(640, 300))
model.add(Bidirectional(<LSTM type>(300, return_sequences=True)))
Upvotes: 2
Views: 714
Reputation: 46
LSTM parameters can be grouped in 3 categories: input weight matrices (W), recurrent weight matrices (R), biases (b). Part of the LSTM cell's computation is W*x + b_i + R*h + b_r
where b_i
are input biases and b_r
are recurrent biases.
If you let b = b_i + b_r
, you could rewrite the above expression as W*x + R*h + b
. In doing so, you've eliminated the need to keep two separate bias vectors (b_i
and b_r
) and instead, you only need to store one vector (b
).
cuDNN sticks with the original mathematical formulation and stores b_i
and b_r
separately. Keras does not; it only stores b
. That's why cuDNN's LSTM has more parameters than Keras.
Upvotes: 1