Reputation: 43
In particular, I'm confused about what it means for an LSTM layer to have (say) 50 cells. Consider the following LSTM block from this awesome blog post:
Say my input xt
is a (20,)
vector and the hidden layer ht
is a (50,)
vector. Given that the cell state Ct
undergoes only point-wise operations (point-wise tanh
and *
) before becoming the new hidden state, I gather that Ct.shape = ht.shape = (50,)
. Now the forget gate looks at the input concatenated with the hidden layer, which would be a (20+50,) = (70,)
vector, which means the forget gate must have a weight matrix of shape (50, 70)
, such that dot(W, [xt, ht]).shape = (50,)
.
So my question at this point is that, am I looking at a LSTM block with 50 cells when Ct.shape = (50,)
? Or am I misunderstanding what it means for a LSTM layer to have 50 cells?
Upvotes: 3
Views: 748
Reputation: 28022
I understand what you are getting confused with. So basically, the black line connecting the two boxes at the top which represents the cell state is actually a set of very small 50 lines grouped together. These get multiplied point wise with the output of the forget gate which has an output consisting of 50 values. These 50 values multiply with the cell state point wise.
Upvotes: 4