Ragadabing
Ragadabing

Reputation: 482

Is the state preserved between samples for one batch in a stateful LSTM in Keras?

Assume I want to classify time series, each of them has 33 time steps. I splitted them up into smaller chunks. So let's say I have the following Input X_1, the dimension is (32,3,1), so 32 samples, 3 time steps, 1 feature:

[
    [[1],  [2],  [3]]  # step 1 to step 3 from time series 1
    [[11], [14], [17]] # step 1 to step 3 from time series 2
    [[3],  [5],  [7]]  # step 1 to step 3 from time series 3
    ...
    [[9],  [7],  [2]]  # step 1 to step 3 from time series 32
]

and Y = [A, A, B, …, B] containing the labels for each of he 32 time series in this batch.

Now I run model.fit(X_1, Y).

Then I take the next 3 time steps for each time series as X_2:

[
    [[4],  [5],  [6]]  # step 4 to step 6 from time series 1
    [[20], [23], [26]] # step 4 to step 6 from time series 2
    [[9],  [11], [13]] # step 4 to step 6 from time series 3
    ...
    [[8],  [1],  [9]]  # step 4 to step 6 from time series 32
]

and again the same Y = [A, A, B, …, B].

Because I've splitted the time series up I use the stateful model, so that the state from X_1 is being saved for X_2.

Again I run model.fit(X_2, Y). I repeat this until I reach X_11 containing time step 31 to 33 from my input data. After I called model.fit(X_11, y) I'm gonna call model.reset_states() because I'm done with the first batch of 32 time series, and I can start again at the beginning with a new batch of 32 time series.

At least until now I thought that this is the way to do this. But now I read, that the state is preserved by default across samples in a batch, so does that means that the state from the first 3 steps of time series 1 in X_0 is also used for the first 3 steps from time series 2? Because that wouldn't make sense, they have nothing in common, the state shouldn't be shared across them. So what is correct?

Upvotes: 0

Views: 52

Answers (1)

Daniel Möller
Daniel Möller

Reputation: 86600

No, states are matrices whose one of the dimensions is the batch size, meaning there is one row of states per sample.

Series 1 does not communicate with series 2.

Upvotes: 1

Related Questions