How to reshape 3 channel dataset for input to neural network

Question

I am trying to feed kth action dataset to a cnn. I am having difficulty with reshaping the data. I have created this array (99,75,120,160) type=uint8 ie, 99 videos belonging to a class with each video having 75 frames, 120x160 dimension for each frame.

model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same'), 
                          input_shape=())) 
###need to reshape data in input_shape

should i specify a dense layer first?

here is my code

model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same'), 
                          input_shape=(75,120,160)))
###need to reshape data in input_shape

model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))

model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=64, return_sequences=True))

model.add(TimeDistributed(Reshape((8, 8, 1))))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(16, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(32, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(64, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(1, (3,3), padding='same')))

model.compile(optimizer='adam', loss='mse')

data = np.load(r"C:\Users\shj_k\Desktop\Project\handclapping.npy")
print (data.shape)
(x_train,x_test) = train_test_split(data)


x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.




print (x_train.shape)
print (x_test.shape)


model.fit(x_train, x_train,
                epochs=100,
                batch_size=1,
                shuffle=False,
                validation_data=(x_test, x_test))

the variables are x_test (25,75,120,160) type=float32 x_train (74,75,120,160) type=float32

complete error for the one in comment is

runfile('C:/Users/shj_k/Desktop/Project/cnn_lstm.py', wdir='C:/Users/shj_k/Desktop/Project') (99, 75, 120, 160) (74, 75, 120, 160) (25, 75, 120, 160) Traceback (most recent call last):

File "", line 1, in runfile('C:/Users/shj_k/Desktop/Project/cnn_lstm.py', wdir='C:/Users/shj_k/Desktop/Project')

File "C:\Users\shj_k\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile execfile(filename, namespace)

File "C:\Users\shj_k\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/shj_k/Desktop/Project/cnn_lstm.py", line 63, in validation_data=(x_test, x_test))

File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine raining.py", line 952, in fit batch_size=batch_size)

File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine raining.py", line 751, in _standardize_user_data exception_prefix='input')

File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine raining_utils.py", line 128, in standardize_input_data 'with shape ' + str(data_shape))

ValueError: Error when checking input: expected time_distributed_403_input to have 5 dimensions, but got array with shape (74, 75, 120, 160)

Thank you for reply

Kai Aeberli · Accepted Answer

A couple of things:

The TimeDistributed layer in Keras needs a time dimension, so for video image processing this could be 75 here (the frames).

It also expects images to be sent in shape (120, 60, 3). So the TimeDistributed layer input_shape should be (75, 120, 160, 3). 3 stands for the RGB channels. If you have greyscale images, 1 as the last dimension should work.

The input_shape always ignores the "row" dimension of your examples, in your case 99.

To check the output shapes created by each layer of the model, put model.summary() after compiling it.

See: https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed

You can convert images into numpy arrays with shape (X, Y, 3) using Keras.preprocessing.image.

from keras.preprocessing import image

# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 160))
# convert PIL.Image.Image type to 3D tensor with shape (120, 160, 3)
x = image.img_to_array(img)

Update: It seems the reason you had to make all images squared (128,128,1) is that in model.fit(), training examples (x_train) and labels (normally y_train) are the same set. If you look at the model summary below, after the Flatten layer everything becomes a square. It is therefore expecting labels to be squares. It makes sense: using this model for prediction would transform a (120,160,1) image into something of the shape (128, 128, 1). Changing model training to below code should therefore work:

x_train = random.random((90, 5, 120, 160, 1)) # training data
y_train = random.random((90, 5, 128, 128, 1)) # labels
model.fit(x_train, y_train)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
time_distributed_1 (TimeDist (None, 5, 120, 160, 64)   320       
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 60, 80, 64)     0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 60, 80, 32)     18464     
_________________________________________________________________
time_distributed_4 (TimeDist (None, 5, 30, 40, 32)     0         
_________________________________________________________________
time_distributed_5 (TimeDist (None, 5, 30, 40, 16)     4624      
_________________________________________________________________
time_distributed_6 (TimeDist (None, 5, 15, 20, 16)     0         
_________________________________________________________________
time_distributed_7 (TimeDist (None, 5, 4800)           0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 5, 64)             1245440   
_________________________________________________________________
time_distributed_8 (TimeDist (None, 5, 8, 8, 1)        0         
_________________________________________________________________
time_distributed_9 (TimeDist (None, 5, 16, 16, 1)      0         
_________________________________________________________________
time_distributed_10 (TimeDis (None, 5, 16, 16, 16)     160       
_________________________________________________________________
time_distributed_11 (TimeDis (None, 5, 32, 32, 16)     0         
_________________________________________________________________
time_distributed_12 (TimeDis (None, 5, 32, 32, 32)     4640      
_________________________________________________________________
time_distributed_13 (TimeDis (None, 5, 64, 64, 32)     0         
_________________________________________________________________
time_distributed_14 (TimeDis (None, 5, 64, 64, 64)     18496     
_________________________________________________________________
time_distributed_15 (TimeDis (None, 5, 128, 128, 64)   0         
_________________________________________________________________
time_distributed_16 (TimeDis (None, 5, 128, 128, 1)    577       
=================================================================
Total params: 1,292,721
Trainable params: 1,292,721
Non-trainable params: 0

Update 2: To make it work with non-square images without changing y, set LSTM(300), Reshape(15, 20, 1), and you remove one of the Conv2D + Upsampling layers afterwards. Then you can use pictures with shape (120,160) even in an autoencoder. The trick is to look at the model summary, and make sure after the LSTM you start with the right shape so that after adding all the other layers, the end result is a shape of (120,160).

model = Sequential()
model.add(
    TimeDistributed(Conv2D(64, (2, 2), activation="relu", padding="same"), =(5, 120, 160, 1)))

model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))

model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=300, return_sequences=True))

model.add(TimeDistributed(Reshape((15, 20, 1))))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(1, (3, 3), padding='same')))


model.compile(optimizer='adam', loss='mse')

model.summary()

x_train = random.random((90, 5, 120, 160, 1))
y_train = random.random((90, 5, 120, 160, 1))

model.fit(x_train, y_train)

How to reshape 3 channel dataset for input to neural network

Answers (2)

Related Questions