Dimension Issues with Keras Conv2D followed by LSTM

Question

I am attempting to implement image sequence prediction in Python 3.5.2 | Anaconda 4.2.0 (64-bit) on my windows 10 machine. I have the latest version of keras and tensorflow.

Each image is 160x128. My training set is 1008 images, size 1008x160x128x1. I want to do a simple network with one convolutional layer and one LSTM layer, for now, where each image is convolved to extract features and then fed into the LSTM to learn the time-dependencies. The output should be k (in the case below k=1) predicted images, size 160x128. The code is below as well as the model.summary().

The output of my convolution layer is 4 dimensional (None, 79, 63, 32). So I reshape the output so it is (None, 32, 79*63) and is the right number of dimensions for the LSTM layer (though I thought this is taken care of behind the scenes...). The model then compiles without error (if I did not do reshape then a dimension error is thrown).

Because each element of my training data is only 1 time point per sample, I do not use TimeDistributed on the convolution layer (after much research, it seems this is the solution). However, I believe for the output layer, all samples come together, so there are as many time points as there are samples and TimeDistributed is to be used. If I do this, then I get the following error:

Traceback (most recent call last): File "C:\seqn_pred ead_images_dataset.py", line 104, in model.fit(train_x, train_y, epochs = 10, batch_size = 1, verbose = 1) File "c:\users\l\anaconda3\lib\site-packages\keras\engine raining.py", line 950, in fit batch_size=batch_size) File "c:\users\l\anaconda3\lib\site-packages\keras\engine raining.py", line 787, in _standardize_user_data exception_prefix='target') File "c:\users\l\anaconda3\lib\site-packages\keras\engine raining_utils.py", line 137, in standardize_input_data str(data_shape)) ValueError: Error when checking target: expected time_distributed_62 to have shape (32, 1) but got array with shape (128, 160)

I have searched all relevant posts on stackoverflow and have tried all relevant "solutions" with no success. And when I attempt to do units = 160*128, there is again an issue with shape (32, 160*128) versus (128, 160). Additionally, I attempted to reshape the target data to be 1008x(160*128)x1 (since TimeDistributed requires 3-d data as well as flattening each target) to get yet another error

ValueError: Error when checking target: expected time_distributed_64 to have shape (32, 20480) but got array with shape (20480, 1)

I have also attempted to run the last layer without the TimeDistributed, and I still receive an error with respect to the target shape.

ValueError: Error when checking target: expected dense_1 to have shape (32, 1) but got array with shape (160, 128)

The primary issue is with shape/dimension both between the convolution and LSTM layer as well as for the final dense layer. Any help would be much appreciated.

train_x, test_x = [D2[i] for i in rand_indx], [D2[i] for i in range(N-1) if i not in rand_indx]
train_y, test_y = [D2[i+1] for i in rand_indx], [D2[i+1] for i in range(N-1) if i not in rand_indx]

train_x = np.array(train_x)
train_x = train_x.reshape(len(train_x), n, m,1)
train_y = np.array(train_y)
train_y = train_y.reshape(train_y.shape[0], train_y.shape[1]*train_y.shape[2], 1)

model = Sequential()
#model.add(TimeDistributed(Conv2D(filters = 32, kernel_size = (3,3), strides = (1,1), activation = 'relu', padding = 'valid', input_shape = (1, n, m, 1))))
#model.add(TimeDistributed(MaxPooling2D(pool_size = (3,3))))
#model.add(TimeDistributed(Dropout(0.30)))
#model.add(TimeDistributed(Flatten()))
model.add(Conv2D(filters = 32, kernel_size = (3,3), strides = (1,1), activation = 'relu', padding = 'valid', input_shape = (n, m, 1)))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.30))
model.add(Reshape((32,-1)))
model.add(LSTM(units = 20, activation = 'relu', return_sequences = True))
model.add(Dropout(0.1))
model.add(TimeDistributed(Dense(1, activation = 'relu')))
optim = krs.optimizers.Adam(lr = 0.375)
model.compile(loss = 'mse', optimizer = optim)
model.fit(train_x, train_y, epochs = 10, batch_size = 1, verbose = 1)

model.summary()

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_73 (Conv2D)           (None, 158, 126, 32)      320       
_________________________________________________________________
max_pooling2d_53 (MaxPooling (None, 79, 63, 32)        0         
_________________________________________________________________
dropout_93 (Dropout)         (None, 79, 63, 32)        0         
_________________________________________________________________
reshape_13 (Reshape)         (None, 32, 4977)          0         
_________________________________________________________________
lstm_57 (LSTM)               (None, 32, 20)            399840    
_________________________________________________________________
dropout_94 (Dropout)         (None, 32, 20)            0         
_________________________________________________________________
dense_44 (Dense)             (None, 32, 1)             21        
=================================================================
Total params: 400,181
Trainable params: 400,181
Non-trainable params: 0
_________________________________________________________________

thushv89 · Accepted Answer

I'm a bit perplexed on what you're trying to achieve here. Here's my 2 cents.

Input : (1008, 160, 128, 1)

Output: (1008, 160*128)

If you have a single output target, you should not use return_sequences=True in the LSTM layer and no need for a TimeDistributed layer. The last bit needs to change as follows.

model.add(Reshape((32,-1)))
model.add(LSTM(units = 20, activation = 'relu'))
model.add(Dropout(0.1))
model.add(Dense(160*128, activation = 'relu'))

If you make the above changes, you can train the model with data having the above shapes for inputs and outputs.

But, There's a red flag you might wanna give some consideration.

The way you reshape the convolution output. What's the purpose? Do you want each channel to be a separate input to the model, if so, you first need to swap axis of the inputs so that channel dimension stays as it is. Because the way you do it right now, (in my opinion) sends something very random to the LSTM layer. Here's the change I'm proposing

model.add(Permute([3,1,2]))
model.add(Dropout(0.30))
model.add(Reshape((32,-1)))

Dimension Issues with Keras Conv2D followed by LSTM

Answers (1)

Related Questions