Reputation: 11
i have a script of building an LSTM model, fit it to train data, predict on some test data. (and just for fun plot predictions on train data, since they should be close to the train data, just to know if my model is constructed well)
1) The first Problem is, that the predictions on test and train data are totally different, depending on if I predict on train or test first.
2) The second Problem might be correlated to the first one, so every time I run my Script, the predictions on the test data is totally different. I know neural networks have some kind of randomness, but as you can see in my resulting plots, its totally different:
edit1: I tried to set 'stateful=False' as suggested in comments without success.
edit2: I've updated the script and the plots and provided some basic sinewave sample data within the new code. The problems still exist even in that simple example
resulting plots of predictions with stateful=False
I got an input signal X as a sine-wave with 100 time steps and random amplitude and frequency. My target y correlates to X (in every time step) and is - in this case - also a sine-wave. The shape of my data is
X_train.shape = (100, 1, 1)
y_train.shape = (100,)
X_test.shape = (100, 1, 1)
y_test.shape = (100,)
I'm using LSTM network trying to fit a complete sine-wave, so batch size = 100, and predicting every single point of a test signal, so batch size for prediction = 1. Also I'm manually resetting the state of the LSTM after every epoch, as mentioned here: https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/
For building my network i followed the "keras-rules" as mentioned here: Delayed echo of sin - cannot reproduce Tensorflow result in Keras
I know the basic approaches of solving the problems, like suggested here: Wrong predictions with LSTM Neural Network but nothing worked for me.
I'm grateful for any kind of help on this, and also on asking better questions, in case I did something wrong because it's my first post here on stack.
Thanks you all! here is my code example:
import numpy as np
import matplotlib.pyplot as plt
from keras import models, layers, optimizers
from keras.callbacks import Callback
# create training sample data
Fs = 100 # sample rate
z = np.arange(100)
f = 1 # frequency in Hz
X_train = np.sin(2 * np.pi * f * z / Fs)
y_train = 0.1 * np.sin(2 * np.pi * f * z / Fs)
# create test sample data
f = 1 # frequency in Hz
X_test = np.sin(2 * np.pi * f * z / Fs) * 2
y_test = 0.2 * np.sin(2 * np.pi * f * z / Fs)
# convert data into LSTM compatible format
y_train = np.array(y_train)
y_test = np.array(y_test)
X_train = X_train.reshape(X_train.shape[0], 1, 1)
X_test = X_test.reshape(X_test.shape[0], 1, 1)
# build and compile model
model = models.Sequential()
model.add(layers.LSTM(1, batch_input_shape=(len(X_train), X_train.shape[1], X_train.shape[2]),
return_sequences=False, stateful=False))
model.add(layers.Dense(X_train.shape[1], input_shape=(1,), activation='linear'))
model.compile(optimizer=optimizers.Adam(lr=0.01, decay=0.008, amsgrad=True), loss='mean_squared_error', metrics=['mae'])
# construct a class for keras callbacks, to make sure the cell state is reset after each epoch
class ResetStatesAfterEachEpoch(Callback):
def on_epoch_end(self, epoch, logs=None):
self.model.reset_states()
reset_state = ResetStatesAfterEachEpoch()
callbacks = [reset_state]
# fit model to training data
history = model.fit(X_train, y_train, epochs=20000, batch_size=len(X_train),
shuffle=False, callbacks=callbacks)
# re-define LSTM model with weights of fit model to predict for 1 point, so also re-define the batch size to 1
new_batch_size = 1
new_model = models.Sequential()
new_model.add(layers.LSTM(1, batch_input_shape=(new_batch_size, X_test.shape[1], X_test.shape[2]), return_sequences=False,
stateful=False))
new_model.add(layers.Dense(X_test.shape[1], input_shape=(1,), activation='linear'))
# copy weights to new model
old_weights = model.get_weights()
new_model.set_weights(old_weights)
# single point prediction on train data
y_pred_train = new_model.predict(X_train, batch_size=new_batch_size)
# single point prediction on test data
y_pred_test = new_model.predict(X_test, batch_size=new_batch_size)
# plot predictions
plt.figure()
plt.plot(y_test, 'r', label='ground truth test',
linestyle='dashed', linewidth=0.8)
plt.plot(y_train, 'b', label='ground truth train',
linestyle='dashed', linewidth=0.8)
plt.plot(y_pred_test, 'g',
label='y pred test', linestyle='dotted',
linewidth=0.8)
plt.plot(y_pred_train, 'k',
label='y pred train', linestyle='-.',
linewidth=0.8)
plt.title('pred order: test, train')
plt.xlabel('time steps')
plt.ylabel('y')
plt.legend(prop={'size': 8})
plt.show()
Upvotes: 1
Views: 1431
Reputation: 11
so I found a solution, I don't know why this works (I'd appreciate if someone does and can leave a comment?), but it works.
I added the derivative of the X_train(here cos), so I got a multi-input LSTM with 2 features. The final X_train is like supposed in this code:
x = np.sin(2 * np.pi * f * z / Fs)
dx_dt = np.cos(2 * np.pi * f * z / Fs)
X_train = np.column_stack((x, dx_dt))
Even a time-shifted y like y_train = 5 * np.sin(2 * np.pi * f * (z + 51) / Fs)
was predicted quite well trained with 3000 epochs. LSTM 1 layer and 3 neurons.
this is the resulting plot.
Upvotes: 0
Reputation: 56367
The problem is here:
model.add(layers.LSTM(1, batch_input_shape=(len(X_train), X_train.shape[1], X_train.shape[2]),
return_sequences=False, stateful=True))
You set stateful=True
in the LSTM
layer, which means that the hidden state is not reset after each prediction, which explains the effect you are seeing. If you do not want this behavior, you should set it to its default value of stateful=False
and it will work as a standard non-stateful LSTM.
Upvotes: 1