MysteryGuy
MysteryGuy

Reputation: 1161

How to prepare data for stateful LSTM in Keras?

I would like to develop a time series approach for binary classification, with stateful LSTM in Keras

Here is how my data look. I got a lot , say N, recordings. Each recording consists in 22 time series of length M_i(i=1,...N). I want to use a stateful model in Keras but I don't know how to reshape my data, especially about how I should define my batch_size.

Here is how I proceeded for stateless LSTM. I created sequences of length look_back for all the recordings so that I had data of size (N*(M_i-look_back), look_back, 22=n_features)

Here is the function I used for that purpose :

def create_dataset(feat,targ, look_back=1):
    dataX, dataY = [], []
#     print (len(targ)-look_back-1)
    for i in range(len(targ)-look_back):
        a = feat[i:(i+look_back), :]
        dataX.append(a)
        dataY.append(targ[i + look_back-1])
    return np.array(dataX), np.array(dataY)

where feat is the 2-D data array of size (n_samples, n_features) (for each recording) and targ is the target vector.

So, my question is, based on the data explained above, how to reshape the data for a stateful model and take into account the batch notion ? Are there precautions to take ?

What I want to do is being able to classify each time_step of each recording as seizure/not seizure.

EDIT : Another problem I thought about is : I have recordings that contain sequences of different lenghts. My stateful model could learn long_term dependencies on each of the recording, so that means batch_size differents from one recording to another... How to deal with that ? Won't it cause generalization trouble when tested on completely different sequences (test_set) ?

Thanks

Upvotes: 3

Views: 1739

Answers (1)

Daniel Möller
Daniel Möller

Reputation: 86630

I don't think you need a stateful layer for your purpose.

If you want long term learning, simply don't create these sliding windows. Have your data shaped as:

(number_of_independent_sequences, length_or_steps_of_a_sequence, variables_or_features_per_step)

I'm not sure I understand the wording correctly in your question. If a "recording" is like a "movie" or a "song", a "voice clip" or something like that, then:

  • number of sequences = number of recordings

Following that idea of "recording", the time steps will be "the frames in a vide", or the "samples" (time x sample_rate for 1 channel) in an audio file. (Be careful, "samples" in keras are "sequences/recordings" while "samples" in audio processing are "steps" in keras).

  • time_steps = number of frames or audio samples

Finally, the number of features/variables. In a movie, it's like RGB channels (3 features), in audio, also the number of channels (2 in stereo). In other kinds of data they may be temperature, pressure, etc.

  • features = number of variables measured in each step

Having your data shaped like this will work for both stateful = True and False.

These two methods of training are equivalent:

#with stateful=False
model.fit(X, Y, batch_size=batch_size)

#with stateful=True
for start in range(0, len(X), batch_size):
    model.train_on_batch(X[start:start+batch_size], Y[start:start+batch_size])
    model.reset_states()

There might be changes only in the way the optimizers are updated.

For your case, if you can create such input data shaped as mentioned and you're not going to recursively predict the future, I don't see a reason to use stateful=True.

Classifying every step

For classifying every step, you don't need to create sliding windows, it's also not necessary to use stateful=True.

Recurrent layers have an option to output all time steps, by setting return_sequences=True.

If you have an input with shape (batch, steps, features), you will need targets with shape (batch, steps, 1), which is one class per step.

In short, you need:

  • LSTM layers with return_sequences=True
  • X_train with shape (files, total_eeg_length, 22)
  • Y_train with shape (files, total_eeg_length, 1)

Hint: as LSTMs never classify the beginning very well, you can try using Bidirectional(LSTM(....)) layers.

Inputs with different lengths

For using inputs with different lengths, you need to set input_shape=(None, features). Considering our discussion in the chat, features = 22.

You can then:

  • Load each EEG individually:

    • X_train as (1, eeg_length, 22)
    • Y_train as (1, eeg_length, 1)
    • Train each EEG separately with model.train_on_batch(array, targets).
    • You will need to manage epochs manually and use test_on_batch for validation data.
  • Pad the shorter EEGs with zeros or another dummy value until they all reach the max_eeg_length and use:

    • a Masking layer at the beginning of the model to discard the steps with the dummy value.
    • X_train as (eegs, max_eeg_length, 22)
    • Y_train as (eegs, max_eeg_length, 1)
    • You can train with a regular model.fit(X_train, Y_train,...)

Upvotes: 7

Related Questions