Reputation: 65
I am trying to feed kth action dataset to a cnn. I am having difficulty with reshaping the data. I have created this array (99,75,120,160) type=uint8 ie, 99 videos belonging to a class with each video having 75 frames, 120x160 dimension for each frame.
model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same'),
input_shape=()))
###need to reshape data in input_shape
should i specify a dense layer first?
here is my code
model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same'),
input_shape=(75,120,160)))
###need to reshape data in input_shape
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=64, return_sequences=True))
model.add(TimeDistributed(Reshape((8, 8, 1))))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(16, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(32, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(64, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(1, (3,3), padding='same')))
model.compile(optimizer='adam', loss='mse')
data = np.load(r"C:\Users\shj_k\Desktop\Project\handclapping.npy")
print (data.shape)
(x_train,x_test) = train_test_split(data)
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
print (x_train.shape)
print (x_test.shape)
model.fit(x_train, x_train,
epochs=100,
batch_size=1,
shuffle=False,
validation_data=(x_test, x_test))
the variables are x_test (25,75,120,160) type=float32 x_train (74,75,120,160) type=float32
complete error for the one in comment is
runfile('C:/Users/shj_k/Desktop/Project/cnn_lstm.py', wdir='C:/Users/shj_k/Desktop/Project') (99, 75, 120, 160) (74, 75, 120, 160) (25, 75, 120, 160) Traceback (most recent call last):
File "", line 1, in runfile('C:/Users/shj_k/Desktop/Project/cnn_lstm.py', wdir='C:/Users/shj_k/Desktop/Project')
File "C:\Users\shj_k\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile execfile(filename, namespace)
File "C:\Users\shj_k\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/shj_k/Desktop/Project/cnn_lstm.py", line 63, in validation_data=(x_test, x_test))
File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine\training.py", line 952, in fit batch_size=batch_size)
File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine\training.py", line 751, in _standardize_user_data exception_prefix='input')
File "C:\Users\shj_k\Anaconda3\lib\site-packages\keras\engine\training_utils.py", line 128, in standardize_input_data 'with shape ' + str(data_shape))
ValueError: Error when checking input: expected time_distributed_403_input to have 5 dimensions, but got array with shape (74, 75, 120, 160)
Thank you for reply
Upvotes: 1
Views: 4883
Reputation: 1220
A couple of things:
The TimeDistributed layer in Keras needs a time dimension, so for video image processing this could be 75 here (the frames).
It also expects images to be sent in shape (120, 60, 3). So the TimeDistributed layer input_shape should be (75, 120, 160, 3). 3 stands for the RGB channels. If you have greyscale images, 1 as the last dimension should work.
The input_shape always ignores the "row" dimension of your examples, in your case 99.
To check the output shapes created by each layer of the model, put model.summary()
after compiling it.
See: https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed
You can convert images into numpy arrays with shape (X, Y, 3) using Keras.preprocessing.image.
from keras.preprocessing import image
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 160))
# convert PIL.Image.Image type to 3D tensor with shape (120, 160, 3)
x = image.img_to_array(img)
Update: It seems the reason you had to make all images squared (128,128,1) is that in model.fit(), training examples (x_train) and labels (normally y_train) are the same set. If you look at the model summary below, after the Flatten layer everything becomes a square. It is therefore expecting labels to be squares. It makes sense: using this model for prediction would transform a (120,160,1) image into something of the shape (128, 128, 1). Changing model training to below code should therefore work:
x_train = random.random((90, 5, 120, 160, 1)) # training data
y_train = random.random((90, 5, 128, 128, 1)) # labels
model.fit(x_train, y_train)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_1 (TimeDist (None, 5, 120, 160, 64) 320
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 60, 80, 64) 0
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 60, 80, 32) 18464
_________________________________________________________________
time_distributed_4 (TimeDist (None, 5, 30, 40, 32) 0
_________________________________________________________________
time_distributed_5 (TimeDist (None, 5, 30, 40, 16) 4624
_________________________________________________________________
time_distributed_6 (TimeDist (None, 5, 15, 20, 16) 0
_________________________________________________________________
time_distributed_7 (TimeDist (None, 5, 4800) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 5, 64) 1245440
_________________________________________________________________
time_distributed_8 (TimeDist (None, 5, 8, 8, 1) 0
_________________________________________________________________
time_distributed_9 (TimeDist (None, 5, 16, 16, 1) 0
_________________________________________________________________
time_distributed_10 (TimeDis (None, 5, 16, 16, 16) 160
_________________________________________________________________
time_distributed_11 (TimeDis (None, 5, 32, 32, 16) 0
_________________________________________________________________
time_distributed_12 (TimeDis (None, 5, 32, 32, 32) 4640
_________________________________________________________________
time_distributed_13 (TimeDis (None, 5, 64, 64, 32) 0
_________________________________________________________________
time_distributed_14 (TimeDis (None, 5, 64, 64, 64) 18496
_________________________________________________________________
time_distributed_15 (TimeDis (None, 5, 128, 128, 64) 0
_________________________________________________________________
time_distributed_16 (TimeDis (None, 5, 128, 128, 1) 577
=================================================================
Total params: 1,292,721
Trainable params: 1,292,721
Non-trainable params: 0
Update 2: To make it work with non-square images without changing y, set LSTM(300), Reshape(15, 20, 1), and you remove one of the Conv2D + Upsampling layers afterwards. Then you can use pictures with shape (120,160) even in an autoencoder. The trick is to look at the model summary, and make sure after the LSTM you start with the right shape so that after adding all the other layers, the end result is a shape of (120,160).
model = Sequential()
model.add(
TimeDistributed(Conv2D(64, (2, 2), activation="relu", padding="same"), =(5, 120, 160, 1)))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=300, return_sequences=True))
model.add(TimeDistributed(Reshape((15, 20, 1))))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(1, (3, 3), padding='same')))
model.compile(optimizer='adam', loss='mse')
model.summary()
x_train = random.random((90, 5, 120, 160, 1))
y_train = random.random((90, 5, 120, 160, 1))
model.fit(x_train, y_train)
Upvotes: 1
Reputation: 65
Thanks to Mr.Kai Aeberli for his assistance. I was able to run the model after resizing the image to 128x128 dimension.The size of dataset may cause system to crash in absence of gpu. Reduce size as necessary. Please refer to the whole comment section if you have doubts. You can find the code here in github
Upvotes: 0