Anthony Sun
Anthony Sun

Reputation: 115

Trying to understand deep RNN weights

I am trying to understand what weights are trained for RNN. For a simple RNN with 1 layer it is easy to understand. For example if the input shape for the time step is [50, 3], there are 3 weights to train for each feature, plus the bias for the weight and plus the weight for the input state. But I am struggling to understand how the paramtres becomes 12, 21, 32 as the number of RNN increases. Thanks for any guidance.

model = Sequential([
    SimpleRNN(1, return_sequences = False, input_shape = [50, 3]),    # 3 features and 1 per Wx and Wy
    Dense(1) 
])


model.summary()

model2 = Sequential([
    SimpleRNN(2, return_sequences = False, input_shape = [50, 3]),    
    Dense(1) # last do not neeed the return sequencies
])

model2.summary()


model3 = Sequential([
    SimpleRNN(3, return_sequences = False, input_shape = [50, 3]),   
    Dense(1) # last do not neeed the return sequencies
])

model3.summary()


model4 = Sequential([
    SimpleRNN(4, return_sequences = False, input_shape = [50, 3]),  
    Dense(1) # last do not neeed the return sequencies
])
Model: "sequential_20"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn_22 (SimpleRNN)    (None, 1)                 5         
_________________________________________________________________
dense_18 (Dense)             (None, 1)                 2         
=================================================================
Total params: 7
Trainable params: 7
Non-trainable params: 0
_________________________________________________________________
Model: "sequential_21"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn_23 (SimpleRNN)    (None, 2)                 12        
_________________________________________________________________
dense_19 (Dense)             (None, 1)                 3         
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________
Model: "sequential_22"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn_24 (SimpleRNN)    (None, 3)                 21        
_________________________________________________________________
dense_20 (Dense)             (None, 1)                 4         
=================================================================
Total params: 25
Trainable params: 25
Non-trainable params: 0
_________________________________________________________________
Model: "sequential_23"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn_25 (SimpleRNN)    (None, 4)                 32        
_________________________________________________________________
dense_21 (Dense)             (None, 1)                 5         
=================================================================
Total params: 37
Trainable params: 37
Non-trainable params: 0
_________________________________________________________________

Upvotes: 0

Views: 199

Answers (1)

amin
amin

Reputation: 289

For your model 2:

model2 = Sequential([
    SimpleRNN(2, return_sequences = False, input_shape = [50, 3]),    
    Dense(1) # last do not neeed the return sequencies
])

The image below shows you the weights to one of the neurons (5 weights) and you will have 1 bias. So each neuron has 6 parameter and the total parameters count will be 6*2 = 12.

enter image description here

The formula for your example will be:
h * (3 + h) + h
where (3 + h) is the number of weights for each neuron and the last h adds the biases to the parameters

Upvotes: 1

Related Questions