Felix
Felix

Reputation: 2678

LSTM many-to-many training in batches of independent examples

I'm still figuring out LSTMs and trying to come up with the optimal and appropriate training routine and data shape.

A time series represents musical notes. Let's call it a song. So I have data in the following form. The series consists of notes that are one-hot encoded. So they have shape (timesteps, features). A copy of this series is made twelve times by transposing (moving up notes of) the series. One song would then take shape (12, timesteps, features). Each of these twelve series should be trained on independently. In addition there are multiple songs that vary in length.

I'd like to train an LSTM such that a prediction is made at every step of a series. So training data of one of the twelve series would be X = series[:-1, :], Y = series[1:, :] and similarly for all twelve versions.

# Example data, numbers not one-hot encoded for brevity
series = [1, 3, 2, 4, 7, 7, 10]
X = [1, 3, 2, 4, 7, 7]
Y = [3, 2, 4, 7, 7, 10]   # Shifted 1 step back

The twelve variations would create a natural batch, as the length does not vary. But my question to you is: can the training be arranged such that these variants are fed to the network as a batch of twelve, but the training is performed as many-to many? (one time step per one prediction)

Currently I have what seems to be a naïve approach for one single example. It feeds the time steps to the network one by one, preserving state in between:

# X = (12 * timesteps, 1, features), Y = (12 * timesteps, features)
model = Sequential()
model.add(LSTM(256, input_shape=(None, X.shape[-1]), batch_size=1, stateful=True))
model.add(Dense(Y.shape[-1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])

for epoch in range(10):
    model.fit(X, Y, epochs=1, batch_size=1, shuffle=False)
    model.reset_states()

How might the mentioned training regime be achieved for a single song of twelve variations?

Upvotes: 0

Views: 1198

Answers (1)

today
today

Reputation: 33470

As you mentioned in your comment you need to wrap a LSTM layer inside TimeDistributed. This way each of the 12 variations will be processed individually. Further, since each feature vector is one-hot encoded we add a Dense layer with a softmax activation as the last layer of our network:

from keras import models, layers

n_features = 20

model_input = layers.Input(shape=(12, None, n_features))
x = layers.TimeDistributed(layers.LSTM(64, return_sequences=True))(model_input)
model_output = layers.Dense(n_features, activation='softmax')(x)

model = models.Model([model_input], [model_output])
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.summary()

Here is the model summary:

Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 12, None, 20)      0         
_________________________________________________________________
time_distributed_1 (TimeDist (None, 12, None, 64)      21760     
_________________________________________________________________
dense_1 (Dense)              (None, 12, None, 20)      1300      
=================================================================
Total params: 23,060
Trainable params: 23,060
Non-trainable params: 0
_________________________________________________________________

Note that this model may be very simple for your problem. You may wish to stack more LSTM layers on top of each other and change parameters to get better accuracy depending on the specific problem you are trying to solve (at the end you must experiment!); but it gives you a rough idea of what a model may look like in this scenario. Although it may seem slightly irrelevant, I suggest you to read the Seq2Seq tutorial in Keras official blog to get more ideas in this regard.

As a side note, if you are using a GPU then you can use CuDNNLSTM layer instead of LSTM; it gives much better performance on GPU.

Upvotes: 1

Related Questions