Reputation: 2245
I'd like to expand on this question of when to reset states.
Stateful LSTM: When to reset states?
Suppose I train a stateful model as such:
for i in range(epochs):
model.fit(X_train, y_train, epochs=1, batch_size=1, shuffle=False)
model.reset_states()
My training and test sets are from one time-series data set, with the test set following immediately after the training set.
Next, I want to evaluate the test set and get an array of the predictions.
score = model.evaluate(X_test, y_test, batch_size=1, verbose=True)
prediction = model.predict(X_test, batch_size=1)
I feel as though resetting the model state at the end of the training loop will cause the evaluate or predict steps to be wrong, at least at the beginning of the set. Is that so? Should I not reset the state for the last epoch if the data continues sequentially into the test set?
Also, after I evaluate on the test set, do I need to restore the model's state to what it was at the end of the training set before I try to predict? Should I copy the model? Save and reload it?
Upvotes: 1
Views: 2566
Reputation: 86600
Indeed, if you reset the states before evaluating the test set, it will assume that the test sequence is a whole new sequence. It will start it from the beginning. If the general behavior of this entire sequence is not changing with time, maybe the error will not be too significant. But I'd not risk it.
If the test sequence is continuing the training sequence, then it should start with proper states for best results.
But I'd say you should do this:
And then this:
Not answered: I don't know if the evaluate
method will bring the states back to where they were before. But I do believe it won't. You may need to evaluate sequences that are long enough to fill your memory, and then you'd have to evaluate in batches.
Off-topic: Misconception in the linked question:
In keras, samples are sequences. The dimensions in a batch for recurrent layers are:
(sequences, timeSteps, features)
, where the number of sequences, the number of samples and the batch size are exactly the same thing. (Check the documentation to confirm that the second dimension is "steps" in a sequence: https://keras.io/layers/recurrent/)Upvotes: 7