How to predict a certain time span into the future with recurrent neural networks in Keras

Question

I have the following code for time series predictions with RNNs and I would like to know whether for the testing I predict one day in advance:

# -*- coding: utf-8 -*-
"""
Time Series Prediction with  RNN

"""
import pandas as pd
import numpy as np
from tensorflow import keras


#%%  Configure parameters

epochs = 5
batch_size = 50

steps_backwards = int(1* 4 * 24)
steps_forward = int(1* 4 * 24)

split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90


#%%  "Reading the data"

dataset = pd.read_csv('C:/User1/Desktop/TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])

df = dataset
data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:2]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)

#%%   Prepare the input data for the RNN

series_reshaped_X =  np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y =  np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])


timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)

X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards] 
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards] 
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards] 


indexWithYLabelsInSeriesReshapedY = 0
lengthOfTheYData = len(data_Y)-steps_backwards -steps_forward
Y = np.empty((lengthOfTheYData, steps_backwards, steps_forward))  
for step_ahead in range(1, steps_forward + 1):     
   Y[..., step_ahead - 1] =   series_reshaped_Y[..., step_ahead:step_ahead + steps_backwards, indexWithYLabelsInSeriesReshapedY]
 
Y_train = Y[:timeslot_x_train_end] 
Y_valid = Y[timeslot_x_train_end:timeslot_x_valid_end] 
Y_test = Y[timeslot_x_valid_end:]


#%%  Build the model and train it

model = keras.models.Sequential([
    keras.layers.SimpleRNN(90, return_sequences=True, input_shape=[None, 2]),
    keras.layers.SimpleRNN(60, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(steps_forward))
    #keras.layers.Dense(steps_forward)
])

model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size,
                    validation_data=(X_valid, Y_valid))


#%%    #Predict the test data
Y_pred = model.predict(X_test)

prediction_lastValues_list=[]

for i in range (0, len(Y_pred)):
  prediction_lastValues_list.append((Y_pred[i][0][steps_forward-1]))

#%% Create thw dataframe for the whole data

wholeDataFrameWithPrediciton = pd.DataFrame((X_test[:,0]))
wholeDataFrameWithPrediciton.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton['predictions'] = prediction_lastValues_list
wholeDataFrameWithPrediciton['difference'] = (wholeDataFrameWithPrediciton['predictions'] - wholeDataFrameWithPrediciton['actual']).abs()
wholeDataFrameWithPrediciton['difference_percentage'] = ((wholeDataFrameWithPrediciton['difference'])/(wholeDataFrameWithPrediciton['actual']))*100

I define eps_forward = int(1* 4 * 24) which is basically one full day (in 15 minutes resolution which makes 1 * 4 *24 = 96 time stamps). I predict the test data by using Y_pred = model.predict(X_test) and I create a list with the predicted values by using for i in range (0, len(Y_pred)): prediction_lastValues_list.append((Y_pred[i][0][steps_forward-1]))

As for me the input and output data of RNNs is quite confusing I am not sure whether for the test dataset I predict one day in advance meaning 96 time steps into the future. Actually what I want is to read historic data and then predict the next 96 time steps based on the historic 96 time steps. Can anyone of you tell me whether I am doing this by using this code or not?

Here I have a link to some test data that I just created randomly. Do not care about the actual values but just on the structure of the prediction: Download Test Data

Am I forecasting 96 steps in advance with the given code (my code is based on a tutorial that can be found here Tutorial RNN for electricity price prediction)?

Reminder: Can anyone tell me something about my question? Or do you need further information? If so, please tell me. I'll highly appreciate your comments and will be quite thankful for your help. I will also award a bounty for a useful answer.

AloneTogether · Accepted Answer

So if your goal is to predict the next 96 steps given 96 steps in the past, I think you are over-complicating it with your current model. Why not start off with something simple like this:

import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler

np.random.seed(42)
tf.random.set_seed(42)

df = pd.read_csv('TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])
df = df.drop('value', 1)
steps = 96
scaler = MinMaxScaler()
data = scaler.fit_transform(df.values)
series_reshaped =  np.array([data[i:i + (steps+steps)].copy() for i in range(len(data) - (steps + steps))])

x_train_index = int(len(series_reshaped)* .80)
x_valid_index = int(len(series_reshaped)* .10)
x_test_index = x_train_index + x_valid_index

X_train = series_reshaped[:x_train_index, :steps] 
X_valid = series_reshaped[x_train_index: x_test_index, :steps] 
X_test = series_reshaped[x_test_index:, :steps] 

Y_train = series_reshaped[:x_train_index, steps:] 
Y_valid = series_reshaped[x_train_index: x_test_index, steps:] 
Y_test = series_reshaped[x_test_index:, steps:]

model = tf.keras.models.Sequential([
    tf.keras.layers.SimpleRNN(96, return_sequences=True, input_shape=(None, 1)),
    tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))
])

model.compile(loss='mae', optimizer=tf.keras.optimizers.Adam(0.001))
history = model.fit(X_train, Y_train, epochs=20,
                    validation_data=(X_valid, Y_valid))

You simply split your data into the 96 steps for training and 96 steps forward as your "labels". After training just make your predictions with your test data:

import matplotlib.pyplot as plt

Y_pred = model.predict(X_test)
prediction_list = []

for i in range (0, len(Y_pred)):
  prediction_list.append(Y_pred[i][0])

prediction_df = pd.DataFrame((Y_test[:, 0]))
prediction_df.rename(columns = {0:'actual'}, inplace = True)
prediction_df['predictions'] = prediction_list
prediction_df['difference'] = (prediction_df['predictions'] - prediction_df['actual']).abs()
prediction_df['difference_percentage'] = ((prediction_df['difference'])/(prediction_df['actual']))*100

print(prediction_df)
fig, ax = plt.subplots(figsize = (24,12))
ax.set_title('Temperatures across time', fontsize=20)
ax.set_xlabel('Timesteps', fontsize=20)
ax.tick_params(axis='both', which='major', labelsize=20)
ax.set_ylabel('Temperature', fontsize=20)
plt1 = ax.plot(prediction_df['predictions'][steps:], color = 'g', label='predictions')
plt2 = ax.plot(prediction_df['actual'][steps:], color = 'r', label='actual')
ax.legend(loc='upper left', prop={'size': 20})

       actual   predictions       difference difference_percentage
0    0.540650  [0.52996427]    [0.010686159]           [1.9765377]
1    0.550813   [0.5463712]   [0.0044417977]           [0.8064075]
2    0.544715  [0.54527795]  [0.00056248903]           [0.1032629]
3    0.543360   [0.5469178]    [0.003557384]          [0.65470064]
4    0.547425   [0.5332471]    [0.014178336]            [2.590003]
..        ...           ...              ...                   ...
977  0.410569    [0.440537]    [0.029967904]           [7.2991133]
978  0.395664  [0.44218686]    [0.046522915]           [11.758189]
979  0.414634    [0.448785]     [0.03415087]            [8.236386]
980  0.414634  [0.43778685]    [0.023152709]           [5.5838885]
981  0.409214  [0.45098385]    [0.041769773]           [10.207315]

Note that this model can be improved in a lot of ways, but I want you to understand the basics, which is why I tried to make it as simple as possible. After you have understood this approach, you can try an autoregressive approach as mentioned by elbe. Also note that I have not de-normalised your data, which is why you get very low values.

How to predict a certain time span into the future with recurrent neural networks in Keras

Answers (2)

Related Questions