Reputation: 1161
I would like to develop a time series approach for binary classification, with stateful LSTM in Keras
Here is how my data look. I got a lot , say N
, recordings. Each recording consists in 22 time series of length M_i(i=1,...N)
. I want to use a stateful model in Keras but I don't know how to reshape my data, especially about how I should define my batch_size
.
Here is how I proceeded for stateless
LSTM. I created sequences of length look_back
for all the recordings so that I had data of size (N*(M_i-look_back), look_back, 22=n_features)
Here is the function I used for that purpose :
def create_dataset(feat,targ, look_back=1):
dataX, dataY = [], []
# print (len(targ)-look_back-1)
for i in range(len(targ)-look_back):
a = feat[i:(i+look_back), :]
dataX.append(a)
dataY.append(targ[i + look_back-1])
return np.array(dataX), np.array(dataY)
where feat
is the 2-D data array of size (n_samples, n_features)
(for each recording) and targ
is the target vector.
So, my question is, based on the data explained above, how to reshape the data for a stateful model and take into account the batch notion ? Are there precautions to take ?
What I want to do is being able to classify each time_step of each recording as seizure/not seizure.
EDIT : Another problem I thought about is : I have recordings that contain sequences of different lenghts. My stateful model could learn long_term dependencies on each of the recording, so that means batch_size differents from one recording to another... How to deal with that ? Won't it cause generalization trouble when tested on completely different sequences (test_set) ?
Thanks
Upvotes: 3
Views: 1739
Reputation: 86630
I don't think you need a stateful layer for your purpose.
If you want long term learning, simply don't create these sliding windows. Have your data shaped as:
(number_of_independent_sequences, length_or_steps_of_a_sequence, variables_or_features_per_step)
I'm not sure I understand the wording correctly in your question. If a "recording" is like a "movie" or a "song", a "voice clip" or something like that, then:
Following that idea of "recording", the time steps will be "the frames in a vide", or the "samples" (time x sample_rate for 1 channel) in an audio file. (Be careful, "samples" in keras are "sequences/recordings" while "samples" in audio processing are "steps" in keras).
Finally, the number of features/variables. In a movie, it's like RGB channels (3 features), in audio, also the number of channels (2 in stereo). In other kinds of data they may be temperature, pressure, etc.
Having your data shaped like this will work for both stateful = True and False.
These two methods of training are equivalent:
#with stateful=False
model.fit(X, Y, batch_size=batch_size)
#with stateful=True
for start in range(0, len(X), batch_size):
model.train_on_batch(X[start:start+batch_size], Y[start:start+batch_size])
model.reset_states()
There might be changes only in the way the optimizers are updated.
For your case, if you can create such input data shaped as mentioned and you're not going to recursively predict the future, I don't see a reason to use stateful=True
.
For classifying every step, you don't need to create sliding windows, it's also not necessary to use stateful=True
.
Recurrent layers have an option to output all time steps, by setting return_sequences=True
.
If you have an input with shape (batch, steps, features)
, you will need targets with shape (batch, steps, 1)
, which is one class per step.
In short, you need:
return_sequences=True
X_train
with shape (files, total_eeg_length, 22)
Y_train
with shape (files, total_eeg_length, 1)
Hint: as LSTMs never classify the beginning very well, you can try using Bidirectional(LSTM(....))
layers.
For using inputs with different lengths, you need to set input_shape=(None, features)
. Considering our discussion in the chat, features = 22
.
You can then:
Load each EEG individually:
X_train
as (1, eeg_length, 22)
Y_train
as (1, eeg_length, 1)
model.train_on_batch(array, targets)
.test_on_batch
for validation data.Pad the shorter EEGs with zeros or another dummy value until they all reach the max_eeg_length
and use:
Masking
layer at the beginning of the model to discard the steps with the dummy value.X_train
as (eegs, max_eeg_length, 22)
Y_train
as (eegs, max_eeg_length, 1)
model.fit(X_train, Y_train,...)
Upvotes: 7