Bryce Wayne
Bryce Wayne

Reputation: 351

Keras LSTM Input/Output Dimension

I am constructing an LSTM predictor with Keras. My input array is historical price data. I segment the data into window_size blocks, in order to predict prediction length blocks ahead. My data is a list of 4246 floating point numbers. I seperate my data into 4055 arrays each of length 168 in order to predict 24 units ahead.

This gives me an x_train set with dimension (4055,168). I then scale my data and try to fit the data but run into a dimension error.

df = pd.DataFrame(data)
print(f"Len of df: {len(df)}")
min_max_scaler = MinMaxScaler()
H = 24

window_size = 7*H
num_pred_blocks = len(df)-window_size-H+1

x_train = []
y_train = []
for i in range(num_pred_blocks):
    x_train_block = df['C'][i:(i + window_size)]
    x_train.append(x_train_block)
    y_train_block = df['C'][(i + window_size):(i + window_size + H)]
    y_train.append(y_train_block)

LEN = int(len(x_train)*window_size)
x_train = min_max_scaler.fit_transform(x_train)
batch_size = 1
    
def build_model():
    model = Sequential()
    model.add(LSTM(input_shape=(window_size,batch_size),
                   return_sequences=True,
                   units=num_pred_blocks))
    model.add(TimeDistributed(Dense(H)))
    model.add(Activation("linear"))
    model.compile(loss="mse", optimizer="rmsprop")
    return model
    
num_epochs = epochs
model= build_model()
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)

The error being returned is as such.

ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 4055 arrays: [array([[0.00630006],

Am I not segmenting correctly? Loading correctly? Should the number of units be different than the number of prediction blocks? I appreciate any help. Thanks.

Edit

The suggestions to convert them to Numpy arrays is correct but MinMixScalar() returns a numpy array. I reshaped the arrays into the proper dimension but now my computer is having CUDA memory error. I consider the problem solved. Thank you.

df = pd.DataFrame(data)
min_max_scaler = MinMaxScaler()
H = prediction_length

window_size = 7*H
num_pred_blocks = len(df)-window_size-H+1

x_train = []
y_train = []
for i in range(num_pred_blocks):
    x_train_block = df['C'][i:(i + window_size)].values
    x_train.append(x_train_block)
    y_train_block = df['C'][(i + window_size):(i + window_size + H)].values
    y_train.append(y_train_block)

x_train = min_max_scaler.fit_transform(x_train)
y_train = min_max_scaler.fit_transform(y_train)
x_train = np.reshape(x_train, (len(x_train), 1, window_size))
y_train = np.reshape(y_train, (len(y_train), 1, H))
batch_size = 1

def build_model():
    model = Sequential()
    model.add(LSTM(batch_input_shape=(batch_size, 1, window_size),
                   return_sequences=True,
                   units=100))
    model.add(TimeDistributed(Dense(H)))
    model.add(Activation("linear"))
    model.compile(loss="mse", optimizer="rmsprop")
    return model

num_epochs = epochs
model = build_model()
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)

Upvotes: 2

Views: 999

Answers (2)

Daniele Cappuccio
Daniele Cappuccio

Reputation: 2192

That's probably because x_train and y_train were not updated to numpy arrays. Take a closer look at this issue on github.

model = build_model()
x_train, y_train = np.array(x_train), np.array(y_train)
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)

Upvotes: 1

Bill Chen
Bill Chen

Reputation: 1749

I don't think you passed the batch size in the model.

input_shape=(window_size,batch_size) is the data dimension. which is correct, but you should use input_shape=(window_size, 1)

If you want to use batch, you have to add another dimension, like this LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2])) (Cited from the Keras)

in your case:

def build_model():
    model = Sequential()
    model.add(LSTM(input_shape=(batch_size, 1, window_size),
                   return_sequences=True,
                   units=num_pred_blocks))
    model.add(TimeDistributed(Dense(H)))
    model.add(Activation("linear"))
    model.compile(loss="mse", optimizer="rmsprop")
    return model

You also need to use np.shape to change the dimension of the of your data, it should be (batch_dim, data_dim_1, data_dim_2). I use numpy, so numpy.reshape() will work.

First your data should be row-wise, so for each row, you should have a shape of (1, 168), then add the batch dimension, it will be (batch_n, 1, 168).

Hope this help.

Upvotes: 1

Related Questions