Reputation: 11937
I am trying to implement a multi step forecasting LSTM model in Keras. The shapes of data is like this:
X : (5831, 48, 1)
y : (5831, 1, 12)
The model that I am trying to use is:
power_in = Input(shape=(X.shape[1], X.shape[2]))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563, kernel_initializer=power_lstm_init, return_sequences=True)(power_in)
main_out = TimeDistributed(Dense(12, kernel_initializer=power_lstm_init))(power_lstm)
While trying to train the model like this:
hist = forecaster.fit([X], y, epochs=325, batch_size=16, validation_data=([X_valid], y_valid), verbose=1, shuffle=False)
I am getting the following error:
ValueError: Error when checking target: expected time_distributed_16 to have shape (48, 12) but got array with shape (1, 12)
How to fix this?
Upvotes: 1
Views: 250
Reputation: 33460
According to your comment:
[The] data i have is like t-48, t-47, t-46, ..... , t-1 as the past data and t+1, t+2, ......, t+12 as the values that I want to forecast
you may not need to use a TimeDistributed
layer at all:
first, just remove the resturn_sequences=True
argument of the LSTM layer. After doing it, the LSTM layer would encode the input timeseries of the past in a vector of shape (50,)
. Now you can feed it directly to a Dense layer with 12 units:
# make sure the labels have are in shape (num_samples, 12)
y = np.reshape(y, (-1, 12))
power_in = Input(shape=(X.shape[1:],))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563,
kernel_initializer=power_lstm_init)(power_in)
main_out = Dense(12, kernel_initializer=power_lstm_init)(power_lstm)
Alternatively, if you would like to use a TimeDistributed
layer and considering that the output is a sequence itself, we can explicitly enforce this temporal dependency in our model by using another LSTM layer before the Dense layer (with the addition of a RepeatVector
layer after the first LSTM layer to make its output a timseries of length 12, i.e. same as the output timeseries length):
# make sure the labels have are in shape (num_samples, 12, 1)
y = np.reshape(y, (-1, 12, 1))
power_in = Input(shape=(48,1))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563,
kernel_initializer=power_lstm_init)(power_in)
rep = RepeatVector(12)(power_lstm)
out_lstm = LSTM(32, return_sequences=True)(rep)
main_out = TimeDistributed(Dense(1))(out_lstm)
model = Model(power_in, main_out)
model.summary()
Model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) (None, 48, 1) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 50) 10400
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 12, 50) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 12, 32) 10624
_________________________________________________________________
time_distributed_1 (TimeDist (None, 12, 1) 33
=================================================================
Total params: 21,057
Trainable params: 21,057
Non-trainable params: 0
_________________________________________________________________
Of course, in both models you may need to tune the hyper-parameters (e.g. number of LSTM layers, the dimension of LSTM layers, etc.) to be able to accurately compare them and achieve good results.
Side note: actually, in your scenario, you don't need to use TimeDistributed
layer at all because (currently) Dense layer is applied on the last axis. Therefore, TimeDistributed(Dense(...))
and Dense(...)
are equivalent.
Upvotes: 1