I'm trying to understand and implement multi-layer LSTM. The problem is i don't know how they connect. I'm having two thoughs in mind: At each timestep, the hidden state H of the first LSTM will become the input of the second LSTM. At each timestep, the hidden state H of the first LSTM will become the initial value for the hidden state of the sencond LSTM, and the input of the first LSTM will become the input for the second LSTM. Please help!

Reputation: 117

Understanding multi-layer LSTM

I'm trying to understand and implement multi-layer LSTM. The problem is i don't know how they connect. I'm having two thoughs in mind:

At each timestep, the hidden state H of the first LSTM will become the input of the second LSTM.
At each timestep, the hidden state H of the first LSTM will become the initial value for the hidden state of the sencond LSTM, and the input of the first LSTM will become the input for the second LSTM.

Please help!

Upvotes: 3

Answers (4)

Mina

Reputation: 748

In PyTorch, multilayer LSTM's implementation suggests that the hidden state of the previous layer becomes the input to the next layer. So your first assumption is correct.

Upvotes: 1

Ido Cohn

Reputation: 1705

TLDR: Each LSTM cell at time t and level l has inputs x(t) and hidden state h(l,t) In the first layer, the input is the actual sequence input x(t), and previous hidden state h(l, t-1), and in the next layer the input is the hidden state of the corresponding cell in the previous layer h(l-1,t).

From https://arxiv.org/pdf/1710.02254.pdf:

To increase the capacity of GRU networks (Hermans and Schrauwen 2013), recurrent layers can be stacked on top of each other. Since GRU does not have two output states, the same output hidden state h'2 is passed to the next vertical layer. In other words, the h1 of the next layer will be equal to h'2. This forces GRU to learn transformations that are useful along depth as well as time.

Upvotes: 3

Tushar Gupta

Reputation: 1669

I am taking help of colah's blog post, just that I will cut short it to make you understand specific part.

As you can look at above image, LSTMs have this chain like structure and each have four neural network layer.

The values that we pass to next timestamp (cell state) and to next layer(hidden state) are basically same and they are desired output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to pass.

We also pass previous cell state information (top arrow to next cell) to next timestamp(cell state) and then decide using sigmoid layer(forget gate layer), how much information we are going to keep taking help of new input and input from previous state.

Hope this helps.

Upvotes: 2

Sorin

Reputation: 11968

There's no definite answer. It depends on your problem and you should try different things.

The simplest thing you can do is to pipe the output from the first LSTM (not the hidden state) as the input to the second layer of LSTM (instead of applying some loss to it). That should work in most cases.

You can try to pipe the hidden state as well but I didn't see it very often.

You can also try other combinations. Say for the second layer you input the output of the first layer and the original input. Or you link to the output of the first layer from the current unit and the previous.

It all depends on your problem and you need to experiment to see what works for you.

Upvotes: -1

Understanding multi-layer LSTM

Answers (4)

Related Questions