NomadCrypto
NomadCrypto

Reputation: 153

understanding shapes for Keras model

I am trying to wrap my head around the shape needed for my specific task. I am attempting to train a qlearner on some time series data which is contained in a dataframe. My dataframe has the following columns: open, close, high, low and I am trying to get a sliding window of say 50x timesteps. Here is example code for each window:

window = df.iloc[0:50]

df_norm = (window - window.mean()) / (window.max() - window.min())

x = df_norm.values
x = np.expand_dims(x, axis=0)
print x.shape
#(1,50, 4)

Now that I know my shape is (1,50,4) for each item in X I'm at a loss for what shape I feed my model. Lets say I have the following:

model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(50,4)))
model.add(LSTM(32, return_sequences=True))
model.add(Dense(num_actions))

Gives the following error

ValueError: could not broadcast input array from shape (50,4) into shape (1,50)

And here is another attempt:

model = Sequential()
model.add(Dense(hidden_size, input_shape=(50,4), activation='relu'))
model.add(Dense(hidden_size, activation='relu'))
model.add(Dense(num_actions))
model.compile(sgd(lr=.2), "mse")

which gives the following error:

ValueError: could not broadcast input array from shape (50,4) into shape (1,50))

Here is the shape the model is expecting and the state from my env:

print "Inputs: {}".format(model.input_shape)
print "actual: {}".format(env.state.shape)
#Inputs: (None, 50, 4)
#actual: (1, 50, 4)

Can someone explain where I am going wrong with the shapes here?

Upvotes: 4

Views: 936

Answers (1)

snakile
snakile

Reputation: 54521

The recurrent layer takes inputs of shape (batch_size, timesteps, input_features). Since the shape of x is (1, 50, 4), the data should be interpreted as a single batch of 50 timesteps, each containing 4 features. When initializing the first layer of a model, you pass an input_shape: a tuple specifying the shape of the input, excluding the batch_size dimension. In the case of LSTM layers, you can pass None as the timesteps dimension. Hence, this is how the first layer of the network should be initialized:

model.add(LSTM(32, return_sequences=True, input_shape=(None, 4)))

The second LSTM layer is followed by a dense layer. So you don't need to return sequences for this layer. Hence, this is how you should initialize the second LSTM layer:

model.add(LSTM(32))

Every batch of 50 time steps in x is supposed to be mapped to a single action vector in y. Therefore, since the shape of x is (1, 50, 4), the shape of y must be (1, num_actions). Make sure y doesn't have the timesteps dimension.

Therefore, under the assumption that x and y have the right shapes, the following code should work:

model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(None, 4)))
model.add(LSTM(32))
model.add(Dense(num_actions))

model.compile(sgd(lr=.2), "mse")

# x.shape == (1, 50, 4)
# y.shape == (1, num_actions)

history = model.fit(x, y)

Upvotes: 2

Related Questions