Reputation: 2217
I am trying to build a convolutional autoencoder however I am having issues with the decoder part. My input images are 32 by 32 by 3 (RGB).
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Activation, Dropout
def deep_autoencoder(img_shape, code_size):
#### encoder ######
encoder = keras.models.Sequential()
encoder.add(keras.layers.InputLayer(img_shape))
encoder.add(Conv2D(32, kernel_size=(3, 3), strides=1,
activation='elu', padding ='same' ))
encoder.add(MaxPooling2D(pool_size=(3, 3), padding = 'same'))
encoder.add(Conv2D(64, kernel_size=(3, 3), strides=1,
activation='elu', padding ='same' ))
encoder.add(MaxPooling2D(pool_size=(3, 3), padding = 'same'))
encoder.add(Conv2D(128, kernel_size=(3, 3), strides=1,
activation='elu', padding ='same' ))
encoder.add(MaxPooling2D(pool_size=(3, 3), padding = 'same') )
encoder.add(Conv2D(256, kernel_size=(3, 3), strides=1,
activation='elu', padding ='same' ))
encoder.add(Flatten())
encoder.add(Dense(code_size, activation='relu'))
##### decoder#####
decoder = keras.models.Sequential()
decoder.add(keras.layers.InputLayer((code_size,)))
decoder.add(Dense(code_size, activation='relu'))
decoder.add(keras.layers.Reshape([16,16])) #???
decoder.add(keras.layers.Conv2DTranspose(filters=128, kernel_size=(3, 3), strides=2, activation='elu', padding='same'))
decoder.add(keras.layers.Conv2DTranspose(filters=64, kernel_size=(3, 3), strides=2, activation='elu', padding='same'))
decoder.add(keras.layers.Conv2DTranspose(filters=32, kernel_size=(3, 3), strides=2, activation='elu', padding='same'))
decoder.add(keras.layers.Conv2DTranspose(filters=3, kernel_size=(3, 3), strides=2, padding='same'))
return encoder, decoder
I assume that my decoder should start of with 16*16 as my dense network at the end of my encoder has 256 nodes. However when I run
encoder, decoder = deep_autoencoder(IMG_SHAPE, code_size=32)
I get the error:
---> 34 decoder.add(keras.layers.Reshape([16,16]))
.
.
.
ValueError: total size of new array must be unchanged
I can add the full error code if its helpful however I feel like I have got something very basic wrong. In order to apply the deconvolutional filters I need to convert the flattened output of the encoder into a matrix.
For ease of reading the network I have added the model summary for the encoder part - which I get if I comment out the decoder part and run encoder.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 32, 32, 3) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 32) 896
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 11, 11, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 11, 11, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 4, 4, 128) 73856
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 2, 2, 128) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 2, 2, 256) 295168
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 1, 1, 256) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 256) 0
_________________________________________________________________
dense_1 (Dense) (None, 32) 8224
=================================================================
Upvotes: 1
Views: 172
Reputation: 3974
What bothers me about your model are mainly two things: first, the asymmetry of your autoencoder. You use conv and pool layers during encoding, but omit the use of an upsampling (inverse pooling) layer. This is already implemented in keras as UpSampling2D
. Furthermore, you should also use the same strides in the conv and deconv layers.
Secondly, after pooling for the fourth time, you end up with a compressed representation of 1x1x256. Why would you try to convert this into a 16x16x1 representation for the decoding part? This is also about symmetry. There's no need to flatten the encoded layer, you can just use the 1x1x256 representation as input for the decoding model. As you are creating the encoder and decoder as separate models, you can stack them like this:
encoder = Sequential()
encoder.add ...
...
decoder = Sequential()
decoder.add(encoder)
decoder.add ...
There's also a tutorial on how to create autoencoders written by Francois Chollet (LINK). It might help you with your implementation.
Upvotes: 1