Kosmylo
Kosmylo

Reputation: 428

Warning for input shape in LSTM model

I have timeseries data of electricity consumption per hour with length (17544, 1) in the following format:

[[17.6]
 [38.2]
 [39.4]
 ...
 [46. ]
 [44. ]
 [40.2]]

My goal is to use as input the last 7 days of data, namely 24*7=168 and predict the next 24 hours of electricity consumption.

I am using the following script to prepare the dataset for training and testing:

# Split into training/test sets
train_size = int(len(data) * 0.7)
val_size = int(len(data) * 0.2)
train, val, test = data[:train_size], data[train_size:(train_size + val_size)], data[(train_size + val_size):]

# Prepare the data in a format required for LSTM (samples, timesteps, features)

def Create_Dataset(df, lookback=1, prediction_horizon=1):
    X, Y = [], []
    for i in range(lookback, len(df)-lookback):
        X.append(df[i-lookback : i, 0])
        Y.append(df[i : i + prediction_horizon, 0])
    return np.array(X), np.array(Y)

lookback = 7 * 24
prediction_horizon = 24
X_train, Y_train = Create_Dataset(train, lookback, prediction_horizon)
X_val, Y_val = Create_Dataset(val, lookback, prediction_horizon)
X_test, Y_test   = Create_Dataset(test, lookback, prediction_horizon)

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_val = np.reshape(X_val, (X_val.shape[0], X_val.shape[1], 1))
X_test  = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

The model is of the following form:

model = Sequential()
model.add(LSTM(64, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(prediction_horizon))

model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mean_absolute_error'])

I have trained the model successfully and I need to validate it with external data. I expect that by giving the following input of length (168, 1):

[[17.6]
 [38.2]
 [39.4]
 ...
 [46.9]
 [48.6]
 [46.1]]

I will get an output of 24 points of the predictions, but instead I am getting an output of shape (168,24) and the following warning:

WARNING:tensorflow:Model was constructed with shape (None, 168, 1) for input KerasTensor(type_spec=TensorSpec(shape=(None, 168, 1), dtype=tf.float32, name='lstm_3_input'), name='lstm_3_input', description="created by layer 'lstm_3_input'"), but it was called on an input with incompatible shape (None, 1, 1).

Any idea of what is wrong here?

Upvotes: 0

Views: 407

Answers (1)

sibidora
sibidora

Reputation: 41

In Keras LSTMs take a 3D input with shape [batch, timesteps, feature]. In Keras batch is generally shown as None since it can vary (you can see this in the warning). Your timesteps is 168 and feature is 1 since the your only feature is a value. I think the problem is you give an input of (168,1) which doesn't have a 3D shape. My guess is you probably reshape it somewhere and it becomes (168,1,1). Otherwise feeding 2D shape should give an error instead of a warning. Then instead it becomes as if you have a batch of 168 instead of 1. Thats why it is saying your input is (None,1,1). So you just have to reshape the input into (1,168,1).

To summarize the problem your input must be a 3D tensor with a shape of [batch, timesteps, feature]. If you have a single sample with a 2D shape you just reshape it into (1,168,1).

Lastly one thing I want to mention is that you are predicting the next 24 hours all at once (In the dense layer). Generally we predict one step at a time. Predicting more than one value is a bit against the purpose of LSTMs. Generally after one prediction we feed the output of LSTM into itself as if it is a correct value. Unfortunately I am not sure how it can be implemented in Keras but I will add a pseudo code to describe the general idea.

out, state = lstm(input.reshape(1,168,1)) 
for i in range(24): 
    out,state = lstm(out,state)

Here you give the output of the LSTM to itself in the next timestep. Also you have to give it the state. LSTMs have a state which is passed and altered in each timestep.(LSTMs use this to pass information through time). Each out tensor in the loop will correspond to predictions for each hour. You can check Andrej Karpathy's famous blog post to get a better idea. In that blog a similar idea is used to build network that generates a script based on a single letter. It is the Character-Level Language Models part. The idea is pretty the same.

Upvotes: 1

Related Questions