LSTM input shape for multivariate time series?

Question

I know this question is asked many times, but I truly can't fix this input shape issue for my case.

My x_train shape == (5523000, 13) // (13 timeseries of length 5523000)

My y_train shape == (5523000, 1)

number of classes == 2

To reshape the x_train and y_train:

x_train= x_train.values.reshape(27615,200,13)  # 5523000/200 = 27615
y_train= y_train.values.reshape((5523000,1))   # I know I have a problem here but I dont know how to fix it

Here is my lstm network :

def lstm_baseline(x_train, y_train):
    batch_size=200
    model = Sequential()
    model.add(LSTM(batch_size, input_shape=(27615,200,13),
                   activation='relu', return_sequences=True))
    model.add(Dropout(0.2))

    model.add(LSTM(128, activation='relu'))
    model.add(Dropout(0.1))

    model.add(Dense(32, activation='relu'))
    model.add(Dropout(0.2))

    model.add(Dense(1, activation='softmax'))

    model.compile(
        loss='categorical_crossentropy',
        optimizer='rmsprop',
        metrics=['accuracy'])

    model.fit(x_train,y_train, epochs= 15)

    return model

Whenever I run the code I get this error :

ValueError: Input 0 is incompatible with layer lstm_10: expected ndim=3, found ndim=4

My question is what I am missing here?

PS: The idea of the project is that I have 13 signals coming from the 13 points of the human body, I want to use them to detect a certain type of diseases (an arousal). By using the LSTM, I want my model to locate the regions where I have that arousal based on these 13 signals.

.

The whole data is 993 patients, for each one I use 13 signals to detect the disorder regions.

if you want me to put the data in 3D dimensions:

(500000 ,13, 993) # (nb_recods, nb_signals, nb_patient)

for each patient I have 500000 observations of 13 signals. nb_patient is 993

It worth noting that the 500000 size doesn't matter ! as i can have patients with more observations or less than that.

Update: here is a sample data of one patient.

Here is a chunk of my data first 2000 rows

abeagomez · Accepted Answer

Ok, I did some changes to your code. First, I still don't now what the "200" in your attempt to reshape your data means, so I'm gonna give you a working code and let's see if you can use it or you can modify it to make your code work. The size of your input data and your targets, have to match. You can not have an input x_train with 27615 rows (which is the meaning of x_train[0] = 27615) and a target set y_train with 5523000 values.

I took the first two rows from the data example that you provided for this example:

x_sample = [[-17,  -7, -7,  0, -5, -18, 73, 9, -282, 28550, 67],
            [-21, -16, -7, -6, -8,  15, 60, 6, -239, 28550, 94]]

y_sample = [0, 0]

Let's reshape x_sample:

x_train = np.array(example)

#Here x_train.shape = (2,11), we want to reshape it to (2,11,1) to
#fit the network's input dimension
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], 1)

You are using a categorical loss, so you have to change your targets to categorical (chek https://keras.io/utils/)

y_train = np.array(target)
y_train = to_categorical(y_train, 2)

Now you have two categories, I assumed two categories as in the data that you provided all the targets values are 0, so I don't know how many possible values your target can take. If your target can take 4 possible values, then the number of categories in the to_categorical function will be 4. Every output of your last dense layer will represent a category and the value of that output, the probability of your input to belong to that category.

Now, we just have to slightly modify your LSTM model:

def lstm_baseline(x_train, y_train):
   batch_size = 200
   model = Sequential()
   #We are gonna change input shape for input_dim
   model.add(LSTM(batch_size, input_dim=1,
                  activation='relu', return_sequences=True))
   model.add(Dropout(0.2))

   model.add(LSTM(128, activation='relu'))
   model.add(Dropout(0.1))

   model.add(Dense(32, activation='relu'))
   model.add(Dropout(0.2))

   #We are gonna set the number of outputs to 2, to match with the
   #number of categories
   model.add(Dense(2, activation='softmax'))

   model.compile(
       loss='categorical_crossentropy',
       optimizer='rmsprop',
       metrics=['accuracy'])

   model.fit(x_train, y_train, epochs=15)

return model

LSTM input shape for multivariate time series?

Answers (2)

Related Questions