Sreeram TP
Sreeram TP

Reputation: 11937

Mutli Step Forecast LSTM model

I am trying to implement a multi step forecasting LSTM model in Keras. The shapes of data is like this:

X : (5831, 48, 1)
y : (5831, 1, 12)

The model that I am trying to use is:

power_in = Input(shape=(X.shape[1], X.shape[2]))


power_lstm = LSTM(50, recurrent_dropout=0.4128,
                  dropout=0.412563, kernel_initializer=power_lstm_init, return_sequences=True)(power_in)

main_out = TimeDistributed(Dense(12, kernel_initializer=power_lstm_init))(power_lstm)

While trying to train the model like this:

hist = forecaster.fit([X], y, epochs=325, batch_size=16, validation_data=([X_valid], y_valid), verbose=1, shuffle=False)

I am getting the following error:

ValueError: Error when checking target: expected time_distributed_16 to have shape (48, 12) but got array with shape (1, 12)

How to fix this?

Upvotes: 1

Views: 250

Answers (1)

today
today

Reputation: 33460

According to your comment:

[The] data i have is like t-48, t-47, t-46, ..... , t-1 as the past data and t+1, t+2, ......, t+12 as the values that I want to forecast

you may not need to use a TimeDistributed layer at all: first, just remove the resturn_sequences=True argument of the LSTM layer. After doing it, the LSTM layer would encode the input timeseries of the past in a vector of shape (50,). Now you can feed it directly to a Dense layer with 12 units:

# make sure the labels have are in shape (num_samples, 12)
y = np.reshape(y, (-1, 12))

power_in = Input(shape=(X.shape[1:],))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
                  dropout=0.412563,
                  kernel_initializer=power_lstm_init)(power_in)

main_out = Dense(12, kernel_initializer=power_lstm_init)(power_lstm)

Alternatively, if you would like to use a TimeDistributed layer and considering that the output is a sequence itself, we can explicitly enforce this temporal dependency in our model by using another LSTM layer before the Dense layer (with the addition of a RepeatVector layer after the first LSTM layer to make its output a timseries of length 12, i.e. same as the output timeseries length):

# make sure the labels have are in shape (num_samples, 12, 1)
y = np.reshape(y, (-1, 12, 1))

power_in = Input(shape=(48,1))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
                  dropout=0.412563,
                  kernel_initializer=power_lstm_init)(power_in)

rep = RepeatVector(12)(power_lstm)
out_lstm = LSTM(32, return_sequences=True)(rep)
main_out = TimeDistributed(Dense(1))(out_lstm)

model = Model(power_in, main_out)
model.summary()

Model summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         (None, 48, 1)             0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 50)                10400     
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 12, 50)            0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 12, 32)            10624     
_________________________________________________________________
time_distributed_1 (TimeDist (None, 12, 1)             33        
=================================================================
Total params: 21,057
Trainable params: 21,057
Non-trainable params: 0
_________________________________________________________________

Of course, in both models you may need to tune the hyper-parameters (e.g. number of LSTM layers, the dimension of LSTM layers, etc.) to be able to accurately compare them and achieve good results.


Side note: actually, in your scenario, you don't need to use TimeDistributed layer at all because (currently) Dense layer is applied on the last axis. Therefore, TimeDistributed(Dense(...)) and Dense(...) are equivalent.

Upvotes: 1

Related Questions