Reputation: 351
I am constructing an LSTM predictor with Keras. My input array is historical price data. I segment the data into window_size
blocks, in order to predict prediction length
blocks ahead. My data is a list of 4246 floating point numbers. I seperate my data into 4055 arrays each of length 168 in order to predict 24 units ahead.
This gives me an x_train
set with dimension (4055,168)
. I then scale my data and try to fit the data but run into a dimension error.
df = pd.DataFrame(data)
print(f"Len of df: {len(df)}")
min_max_scaler = MinMaxScaler()
H = 24
window_size = 7*H
num_pred_blocks = len(df)-window_size-H+1
x_train = []
y_train = []
for i in range(num_pred_blocks):
x_train_block = df['C'][i:(i + window_size)]
x_train.append(x_train_block)
y_train_block = df['C'][(i + window_size):(i + window_size + H)]
y_train.append(y_train_block)
LEN = int(len(x_train)*window_size)
x_train = min_max_scaler.fit_transform(x_train)
batch_size = 1
def build_model():
model = Sequential()
model.add(LSTM(input_shape=(window_size,batch_size),
return_sequences=True,
units=num_pred_blocks))
model.add(TimeDistributed(Dense(H)))
model.add(Activation("linear"))
model.compile(loss="mse", optimizer="rmsprop")
return model
num_epochs = epochs
model= build_model()
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)
The error being returned is as such.
ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 4055 arrays: [array([[0.00630006],
Am I not segmenting correctly? Loading correctly? Should the number of units be different than the number of prediction blocks? I appreciate any help. Thanks.
The suggestions to convert them to Numpy arrays is correct but MinMixScalar() returns a numpy array. I reshaped the arrays into the proper dimension but now my computer is having CUDA memory error. I consider the problem solved. Thank you.
df = pd.DataFrame(data)
min_max_scaler = MinMaxScaler()
H = prediction_length
window_size = 7*H
num_pred_blocks = len(df)-window_size-H+1
x_train = []
y_train = []
for i in range(num_pred_blocks):
x_train_block = df['C'][i:(i + window_size)].values
x_train.append(x_train_block)
y_train_block = df['C'][(i + window_size):(i + window_size + H)].values
y_train.append(y_train_block)
x_train = min_max_scaler.fit_transform(x_train)
y_train = min_max_scaler.fit_transform(y_train)
x_train = np.reshape(x_train, (len(x_train), 1, window_size))
y_train = np.reshape(y_train, (len(y_train), 1, H))
batch_size = 1
def build_model():
model = Sequential()
model.add(LSTM(batch_input_shape=(batch_size, 1, window_size),
return_sequences=True,
units=100))
model.add(TimeDistributed(Dense(H)))
model.add(Activation("linear"))
model.compile(loss="mse", optimizer="rmsprop")
return model
num_epochs = epochs
model = build_model()
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)
Upvotes: 2
Views: 999
Reputation: 2192
That's probably because x_train
and y_train
were not updated to numpy arrays. Take a closer look at this issue on github.
model = build_model()
x_train, y_train = np.array(x_train), np.array(y_train)
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)
Upvotes: 1
Reputation: 1749
I don't think you passed the batch size in the model.
input_shape=(window_size,batch_size)
is the data dimension. which is correct, but you should use input_shape=(window_size, 1)
If you want to use batch, you have to add another dimension, like this LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]))
(Cited from the Keras)
in your case:
def build_model():
model = Sequential()
model.add(LSTM(input_shape=(batch_size, 1, window_size),
return_sequences=True,
units=num_pred_blocks))
model.add(TimeDistributed(Dense(H)))
model.add(Activation("linear"))
model.compile(loss="mse", optimizer="rmsprop")
return model
You also need to use np.shape
to change the dimension of the of your data, it should be (batch_dim
, data_dim_1
, data_dim_2
). I use numpy
, so numpy.reshape()
will work.
First your data should be row-wise, so for each row, you should have a shape of (1, 168)
, then add the batch dimension, it will be (batch_n, 1, 168)
.
Hope this help.
Upvotes: 1