Reputation: 8903
Having a set of black and white images with the following shape (1000, 11, 1)
. I'm trying to modify the keras mnist example to work with my data, so I've written the following code:
input_img = layers.Input(shape=(1000, 11, 1))
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
Printing the summary, I can see that the output shape is different from the input shape:
Model: "model_16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_18 (InputLayer) [(None, 1000, 11, 1)] 0
_________________________________________________________________
conv2d_119 (Conv2D) (None, 1000, 11, 16) 160
_________________________________________________________________
max_pooling2d_51 (MaxPooling (None, 500, 6, 16) 0
_________________________________________________________________
conv2d_120 (Conv2D) (None, 500, 6, 8) 1160
_________________________________________________________________
max_pooling2d_52 (MaxPooling (None, 250, 3, 8) 0
_________________________________________________________________
conv2d_121 (Conv2D) (None, 250, 3, 8) 584
_________________________________________________________________
max_pooling2d_53 (MaxPooling (None, 125, 2, 8) 0
_________________________________________________________________
conv2d_122 (Conv2D) (None, 125, 2, 8) 584
_________________________________________________________________
up_sampling2d_51 (UpSampling (None, 250, 4, 8) 0
_________________________________________________________________
conv2d_123 (Conv2D) (None, 250, 4, 8) 584
_________________________________________________________________
up_sampling2d_52 (UpSampling (None, 500, 8, 8) 0
_________________________________________________________________
conv2d_124 (Conv2D) (None, 498, 6, 16) 1168
_________________________________________________________________
up_sampling2d_53 (UpSampling (None, 996, 12, 16) 0
_________________________________________________________________
conv2d_125 (Conv2D) (None, 996, 12, 1) 145
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________
And in fact, the training fails with an error:
ValueError: logits and labels must have the same shape ((None, 996, 12, 1) vs (None, 1000, 11, 1))
What am I doing wrong? How can I fix my code to work with my image dimenssions?
Upvotes: 1
Views: 733
Reputation:
You can modify the network structure of the decoder as follows to match the input shape of the encoder and output shape of the decoder. The Cropping2D
layer crops along spatial dimensions, i.e. height and width.
input_img = layers.Input(shape=(1000, 11, 1))
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((4, 4))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
# Add a cropping layer
decoded=layers.Cropping2D(cropping=((0,0),(3,2)))(decoded)
Output of model.summary():
Model: "model_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_9 (InputLayer) [(None, 1000, 11, 1)] 0
conv2d_49 (Conv2D) (None, 1000, 11, 16) 160
max_pooling2d_24 (MaxPoolin (None, 500, 6, 16) 0
g2D)
conv2d_50 (Conv2D) (None, 500, 6, 8) 1160
max_pooling2d_25 (MaxPoolin (None, 250, 3, 8) 0
g2D)
conv2d_51 (Conv2D) (None, 250, 3, 8) 584
max_pooling2d_26 (MaxPoolin (None, 125, 2, 8) 0
g2D)
conv2d_52 (Conv2D) (None, 125, 2, 8) 584
up_sampling2d_24 (UpSamplin (None, 250, 4, 8) 0
g2D)
conv2d_53 (Conv2D) (None, 250, 4, 8) 584
up_sampling2d_25 (UpSamplin (None, 1000, 16, 8) 0
g2D)
conv2d_54 (Conv2D) (None, 1000, 16, 1) 73
cropping2d_6 (Cropping2D) (None, 1000, 11, 1) 0
=================================================================
Total params: 3,145
Trainable params: 3,145
Non-trainable params: 0
_________________________________________________________________
Upvotes: 1