Quetzalcoatl
Quetzalcoatl

Reputation: 2146

Keras multi-step LSTM batch train classification at each step

Question

How to batch train a multi-step LSTM in Keras for single-label multi-class classificaiton, at each time-step for > 2 classes?

Current Error

Each target batch is a 3-dimensional array with shape (batch_size, n_time_steps, n_classes) but Keras expects a 2-dimensional array.

Example/Context

Suppose we have daily closing prices for N stocks and for each day and stock: m features and one of three actions "bought", "held", "sold". If there are 30 days' worth of data per stock we may train an LSTM to predict each action (on each day, for each stock) as follows.

For each batch of samples of size n << N, X_train will have a shape of (n, 30, m) i.e. n samples, 30 time-steps, and m features. After one-hot encoding "bought", "held", and "sold" Y_train will have a shape of (n, 30, 3), which is an array of 3-dimensions.

The problem is that Keras is giving an error due to expecting Y_train to be 2-dimensional.

Here is a code snippet:

n_time_steps = 30
n_ftrs = 700
n_neurons = 100
n_classes = 3
batch_size = 256
n_epochs = 500
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(n_time_steps, n_ftrs)))
model.add(Dense(n_classes, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', 
    metrics=['accuracy'])

for e in range(n_epochs):
  X_train, Y_train = BatchGenerator()
  # Y_train.shape = (256, 30, 3)
  model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=1)

Error

Error when checking target: expected dense_20 to have 2 dimensions, 
but got array with shape (256, 30, 3)

Upvotes: 2

Views: 1658

Answers (2)

today
today

Reputation: 33470

If you take a look at the model.summary() output you would realize what the problem is:

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 100)               320400    
_________________________________________________________________
dense_74 (Dense)             (None, 3)                 303       
=================================================================
Total params: 320,703
Trainable params: 320,703
Non-trainable params: 0
_________________________________________________________________

As you can see the output shape of LSTM layer is (None, 100) which means that only the output of last timestep is returned. And as a result, the output shape of Dense layer is (None, 3) which means that it would classify the whole input timeseries (i.e. the whole 30-day data of a stock) in to one of 3 classes. This is not what you want. Rather you want to classify each timestep of the input timeseries. To make this happen, as @VegardKT suggested, you can pass return_sequences=True to the LSTM layer to have its output at each timestep. Let's look at the model.summary() output after this change:

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_2 (LSTM)                (None, 30, 100)           320400    
_________________________________________________________________
dense_75 (Dense)             (None, 30, 3)             303       
=================================================================
Total params: 320,703
Trainable params: 320,703
Non-trainable params: 0
_________________________________________________________________

​As you can see, now the LSTM layer gives the output of each timestep and therefore the Dense layer which acts a classifier would be able to classify each of those timesteps into one of 3 classes as desired.

Upvotes: 3

VegardKT
VegardKT

Reputation: 1246

You need to add change your LSTM layer like so:

model.add(LSTM(n_neurons, input_shape=(n_time_steps, n_ftrs), return_sequences=True))

This argument does the following:

return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.

I'm going to be completely honest and say im not sure why it is like this, my LSTM is a bit rusty, as i would believe it should be the other way around, but im able to get your code running like this. If someone would like to clarify why this works that would be great.

Upvotes: 1

Related Questions