Shlomi Schwartz
Shlomi Schwartz

Reputation: 8903

How to match input and output shapes of Conv2D AutoEncoder

Having a set of black and white images with the following shape (1000, 11, 1). I'm trying to modify the keras mnist example to work with my data, so I've written the following code:

input_img = layers.Input(shape=(1000, 11, 1))

x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

Printing the summary, I can see that the output shape is different from the input shape:

Model: "model_16"
Layer (type)                 Output Shape              Param #   
input_18 (InputLayer)        [(None, 1000, 11, 1)]     0         
conv2d_119 (Conv2D)          (None, 1000, 11, 16)      160       
max_pooling2d_51 (MaxPooling (None, 500, 6, 16)        0         
conv2d_120 (Conv2D)          (None, 500, 6, 8)         1160      
max_pooling2d_52 (MaxPooling (None, 250, 3, 8)         0         
conv2d_121 (Conv2D)          (None, 250, 3, 8)         584       
max_pooling2d_53 (MaxPooling (None, 125, 2, 8)         0         
conv2d_122 (Conv2D)          (None, 125, 2, 8)         584       
up_sampling2d_51 (UpSampling (None, 250, 4, 8)         0         
conv2d_123 (Conv2D)          (None, 250, 4, 8)         584       
up_sampling2d_52 (UpSampling (None, 500, 8, 8)         0         
conv2d_124 (Conv2D)          (None, 498, 6, 16)        1168      
up_sampling2d_53 (UpSampling (None, 996, 12, 16)       0         
conv2d_125 (Conv2D)          (None, 996, 12, 1)        145       
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0

And in fact, the training fails with an error:

ValueError: logits and labels must have the same shape ((None, 996, 12, 1) vs (None, 1000, 11, 1))

What am I doing wrong? How can I fix my code to work with my image dimenssions?

Upvotes: 1

Views: 733

Answers (1)



You can modify the network structure of the decoder as follows to match the input shape of the encoder and output shape of the decoder. The Cropping2D layer crops along spatial dimensions, i.e. height and width.

input_img = layers.Input(shape=(1000, 11, 1))

x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((4, 4))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
# Add a cropping layer

Output of model.summary():

Model: "model_7"
 Layer (type)                Output Shape              Param #   
 input_9 (InputLayer)        [(None, 1000, 11, 1)]     0         
 conv2d_49 (Conv2D)          (None, 1000, 11, 16)      160       
 max_pooling2d_24 (MaxPoolin  (None, 500, 6, 16)       0         
 conv2d_50 (Conv2D)          (None, 500, 6, 8)         1160      
 max_pooling2d_25 (MaxPoolin  (None, 250, 3, 8)        0         
 conv2d_51 (Conv2D)          (None, 250, 3, 8)         584       
 max_pooling2d_26 (MaxPoolin  (None, 125, 2, 8)        0         
 conv2d_52 (Conv2D)          (None, 125, 2, 8)         584       
 up_sampling2d_24 (UpSamplin  (None, 250, 4, 8)        0         
 conv2d_53 (Conv2D)          (None, 250, 4, 8)         584       
 up_sampling2d_25 (UpSamplin  (None, 1000, 16, 8)      0         
 conv2d_54 (Conv2D)          (None, 1000, 16, 1)       73        
 cropping2d_6 (Cropping2D)   (None, 1000, 11, 1)       0         
Total params: 3,145
Trainable params: 3,145
Non-trainable params: 0

Upvotes: 1

Related Questions