Reputation: 5832
I'm trying to use an Autoencoder to get a timeseries and reconstruct it. My data has 10 timeseries of length 365 and one dimension, with the look back window of 28 days. The Autoencoder model is based on this keras blog. So basically this model should get a sequence of 28 values and tries to reconstruct them as its result. As you can see in the code, I trained the model then gave it 100 sequence as test. Then I tried to plot the resulted value for each of the sequence steps (see the picture). I want to see how each of these 28 timesteps are constructed. So in the picture you can see 28 plots for each of them (the blue line is real/expected value and the orange line is the reconstructed result). For the first timestep it's always bad and almost a constant value, then it gets better and better for the next timesteps and for the last one it's almost able to reconstruct the real value. How is that happening? I expected to see kinda the same pattern for all timesteps. How do you interpret these plots and the way Autoencoder is working here?
seq_len = 28
n_features = 1
enc_hunits = 14
data.shape = (5642, 28, 1)
inputEncoder = Input(shape=(seq_len, n_features), name='inputEncoder')
outEncoder = LSTM(enc_hunits, name='outputEncoder')(inputEncoder)
encoder_model = Model(inputEncoder, outEncoder)
c = RepeatVector(seq_len/enc_hunits, name='inputDecoder')(outEncoder)
c_reshaped = Reshape((seq_len, n_features), name='ReshapeLayer')(c)
outDecoder = LSTM(1, return_sequences=True, name='decoderLSTM')(c_reshaped)
autoencoder = Model(inputEncoder, outDecoder)
autoencoder.compile(loss='mse', optimizer='rmsprop')
history = autoencoder.fit(data, data,
validation_split=validation_split_ratio,
epochs=epochs,
)
test = data[:100, :, :] # get 100 examples from training
result = autoencoder.predict(test)
#....
plot_results(test, result, n_ts=seq_len)
def plot_results(exp, rec, n_ts=28):
fig = pyplot.figure(figsize=(30, 30))
fig.subplots_adjust(hspace=0.32, wspace=0.15)
count = 1
for irow in range(n_ts):
ax = fig.add_subplot(n_ts/2, 2, count)
ax.plot(exp[:, irow], "--", marker='o', label="Input")
ax.plot(rec[:, irow], marker='o', label="Reconstructed", linewidth=3, alpha=0.5)
ax.set_title("{:}th timestep".format(irow))
ax.legend()
count += 1
pyplot.legend()
pyplot.savefig("all_timesteps.png")
pyplot.clf()
UPDATE: What will be the difference if I remove the Reshape line and only have the RepeatVector doing the repeat for seq_len times, like this: c = RepeatVector(seq_len, name='inputDecoder')(outEncoder) . So in my case it will repeat the vector 28 times instead of 2 times. How that will effect the decoder input? I tried it and plotted all timesteps again and this time none of the timesteps is reconstructed correctly. The first plot is the same as the first plot in the picture here and the rest are almost the same as the 2nd one in the picture. I wonder why?
Upvotes: 1
Views: 132
Reputation: 1246
The autoencoder uses 28 timesteps to predict.
seq_length = 28
On your 0th timestemp, it has only that timestep available. this is what is causing the results you are seeing.
Ideally you should start predicting on timesteps after the 27th, so that the autoencoder has a complete sequence length to predict on.
Upvotes: 2