How to vectorize LSTMs?

Question

In particular, I'm confused about what it means for an LSTM layer to have (say) 50 cells. Consider the following LSTM block from this awesome blog post:

Say my input xt is a (20,) vector and the hidden layer ht is a (50,) vector. Given that the cell state Ct undergoes only point-wise operations (point-wise tanh and *) before becoming the new hidden state, I gather that Ct.shape = ht.shape = (50,). Now the forget gate looks at the input concatenated with the hidden layer, which would be a (20+50,) = (70,) vector, which means the forget gate must have a weight matrix of shape (50, 70), such that dot(W, [xt, ht]).shape = (50,).

So my question at this point is that, am I looking at a LSTM block with 50 cells when Ct.shape = (50,)? Or am I misunderstanding what it means for a LSTM layer to have 50 cells?

London guy · Accepted Answer

I understand what you are getting confused with. So basically, the black line connecting the two boxes at the top which represents the cell state is actually a set of very small 50 lines grouped together. These get multiplied point wise with the output of the forget gate which has an output consisting of 50 values. These 50 values multiply with the cell state point wise.

How to vectorize LSTMs?

Answers (1)

Related Questions