Reputation: 1752
The goal of the model is to categorically classify video inputs by the word articulated with them. Each input has the dimensionality 45 frames, 1 gray color channel, 100 pixel rows, and 150 pixel columns (45, 1, 100, 150), while each corresponding output is a one hot encoded representation of one of 3 possible words (e.g. "yes" => [0, 0, 1]).
During the compilation of the model, the following error occurs:
ValueError: Dimensions must be equal, but are 1 and 3 for 'Conv2D_94' (op: 'Conv2D') with
input shapes: [?,100,150,1], [3,3,3,32].
Here is the script used to train the model:
video = Input(shape=(self.frames_per_sequence,
1,
self.rows,
self.columns))
cnn = InceptionV3(weights="imagenet",
include_top=False)
cnn.trainable = False
encoded_frames = TimeDistributed(cnn)(video)
encoded_vid = LSTM(256)(encoded_frames)
hidden_layer = Dense(output_dim=1024, activation="relu")(encoded_vid)
outputs = Dense(output_dim=class_count, activation="softmax")(hidden_layer)
osr = Model([video], outputs)
optimizer = Nadam(lr=0.002,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08,
schedule_decay=0.004)
osr.compile(loss="categorical_crossentropy",
optimizer=optimizer,
metrics=["categorical_accuracy"])
Upvotes: 3
Views: 8291
Reputation: 37691
According to Convolution2D in Keras, the following should be the shape of input and filter.
shape of input = [batch, in_height, in_width, in_channels]
shape of filter = [filter_height, filter_width, in_channels, out_channels]
So, the meaning of the error you are getting -
ValueError: Dimensions must be equal, but are 1 and 3 for 'Conv2D_94' (op: 'Conv2D') with
input shapes: [?,100,150,1], [3,3,3,32].
[?,100,150,1]
means in_channels
value is 1 whereas [3,3,3,32]
means in_channels
value is 3. Thats why you are getting the error - Dimensions must be equal, but are 1 and 3
.
So you can change the shape of the filter to [3, 3, 1, 32]
.
Upvotes: 3