sirjay
sirjay

Reputation: 1766

How to predict multiple features using keras with time series?

I have a problem I don't know how to fix transform to add new features in order to make more proper forecast. The code below predicts stock prices by Close value. Data:

                             Open    High     Low   Close  Adj Close   Volume
Datetime                                                                     
2020-03-10 09:30:00+03:00  5033.0  5033.0  4690.0  4840.0     4840.0   702508
2020-03-10 10:30:00+03:00  4840.0  4870.0  4700.0  4746.5     4746.5  1300648
2020-03-10 11:30:00+03:00  4746.5  4783.0  4706.0  4745.5     4745.5  1156482
2020-03-10 12:30:00+03:00  4745.5  4884.0  4730.0  4870.0     4870.0  1213268
2020-03-10 13:30:00+03:00  4874.0  4990.5  4867.5  4886.5     4886.5  1958028
...                           ...     ...     ...     ...        ...      ...
2020-04-03 14:30:00+03:00  5177.0  5217.0  5164.0  5211.5     5211.5   385696
2020-04-03 15:30:00+03:00  5212.0  5364.0  5191.0  5269.5     5269.5  1091066
2020-04-03 16:30:00+03:00  5270.0  5297.0  5209.0  5220.5     5220.5   518686
2020-04-03 17:30:00+03:00  5222.0  5271.0  5184.0  5220.5     5220.5   665096
2020-04-03 18:30:00+03:00  5217.5  5223.5  5197.0  5204.5     5204.5   261400

I want to add Volume and Open features, but getting error:

    predictions = scaler.inverse_transform(predictions)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/preprocessing/_data.py", line 436, in inverse_transform
    X -= self.min_
ValueError: non-broadcastable output operand with shape (40,1) doesn't match the broadcast shape (40,3)

Q1: How to change inverse_transform and what else do I need to change (input_shape argument maybe) to get correct results?

Q2: The result will be prediction of Close value. But how do I predict Volume value also? I guess I need to set model.add(Dense(2)), but can I do 2 predictions correctly in one code, or I need to execute script separately? How to do that? How do I get Volume than Open when model.add(Dense(2))?

Full code:

from math import sqrt
from numpy import concatenate
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding
from keras.layers import LSTM
import numpy as np
from datetime import datetime, timedelta
import yfinance as yf

start = (datetime.now() - timedelta(days=30))
end = (datetime.now() - timedelta(days=0))
df = yf.download(tickers="LKOH.ME", start=start.strftime("%Y-%m-%d"), end=end.strftime("%Y-%m-%d"), interval="60m")
df = df.loc[start.strftime("%Y-%m-%d"):end.strftime("%Y-%m-%d")]

# I need here add another features
# df.filter(['Close', 'Open', 'Volume']) <-- this will make further an error with shapes
data = df.filter(['Close'])
dataset = data.values

#Get the number of rows to train the model on, 40 rows for test
training_data_len = len(dataset) - 40
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
train_data = scaled_data[0:int(training_data_len), :]
x_train = []
y_train = []

for i in range(60, len(train_data)):
    x_train.append(train_data[i-60:i, 0])
    y_train.append(train_data[i, 0])

x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))

model = Sequential()
# should i change to input_shape=(x_train.shape[1], 3) ?
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x_train, y_train, batch_size=1, epochs=1)


test_data = scaled_data[training_data_len - 60: , :]
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
    x_test.append(test_data[i-60:i, 0])


x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions) # error here

Upvotes: 1

Views: 541

Answers (1)

Chris
Chris

Reputation: 1668

The problem is that you are fitting MinMaxScaler on dataset, then splitting dataset into x_train and y_train and then later on trying to use the inverse_transform method on the predictions, which have the same shape as y_train. I suggest you create x_train and y_train and fit MinMaxScaler only to x_train. y_train doesn't need to be scaled for the model and that will save you needing to inverse_transform the predictions completely.

So instead of

#Get the number of rows to train the model on, 40 rows for test
training_data_len = len(dataset) - 40
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
train_data = scaled_data[0:int(training_data_len), :]
x_train = []
y_train = []

for i in range(60, len(train_data)):
    x_train.append(train_data[i-60:i, 0])
    y_train.append(train_data[i, 0])

x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))

Use

#Get the number of rows to train the model on, 40 rows for test
training_data_len = len(dataset) - 40
train_data = scaled_data[0:int(training_data_len), :]
x_train = []
y_train = []

for i in range(60, len(train_data)):
    x_train.append(train_data[i-60:i, 0])
    y_train.append(train_data[i, 0])

x_train, y_train = np.array(x_train), np.array(y_train)

scaler = MinMaxScaler(feature_range=(0,1))
x_train = scaler.fit_transform(x_train)  # Only scaling x_train

x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))

and just delete the line predictions = scaler.inverse_transform(predictions).

Updates relating to additional questions in the comments

The definition of y_test is inconsistent with y_train. Specifically, y_test is defined as y_test = dataset[training_data_len:, :] which is using all of the columns of dataset. Instead, to be consistent with y_train, it should be dataset[training_data_len:, 0].

Handling splitting the data is often clearer and less error-prone if done in pandas:

# Starting with the dataframe 'data'
data = df.filter(['Close', 'Open', 'Volume'])

# Create x/y test/train directly from 'data'
training_data_len = len(data) - 40
x_train = data[['Open', 'Volume']][:training_data_len]
y_train = data.Close[:training_data_len]
x_test = data[['Open', 'Volume']][training_data_len:]
y_test = data.Close[training_data_len:]

# Then confirm you have the expected subsets by checking things like
# shape (and info(), describe(), etc.)
x_train.shape, x_test.shape
> ((160, 2), (40, 2))

y_train.shape, y_test.shape
> ((160,), (40,))

Upvotes: 2

Related Questions