Keras: What dimensions exactly do I need to change?

Question

The problem

I am trying to build the convolutional autoencoder in https://blog.keras.io/building-autoencoders-in-keras.html but in the code part

encoded_input = Input(shape=(1, 28, 28))
# retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# create the decoder model
decoder = Model(input=encoded_input, output=decoder_layer(encoded_input))

I get the error

[...]lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, debug_python_shape_fn, require_shape_fn)
    673       missing_shape_fn = True
    674     else:
--> 675       raise ValueError(err.message)
    676 
    677   if missing_shape_fn:

ValueError: Dimensions must be equal, but are 1 and 16 for 'Conv2D_12' (op: 'Conv2D') with input shapes: [?,28,28,1], [3,3,16,1].

I understand that the problem is the wrong dimensions and I need to set

shape of input = [batch, in_height, in_width, in_channels]
shape of filter = [in_channels, filter_height, filter_width, out_channels]

correctly - using 'th' mode (even though using TensorFlow as backend). However I have no idea how to do it. What/Where is the screw I need to turn exactly?

More information

Input data

The data I load (not the MNIST, but has similar dimensions) has the shape:

print(x_train.shape)
print(x_test.shape)

(60000, 784)
(10000, 784)

Code

from keras.layers import Input, Dense, Convolution2D, MaxPooling2D, UpSampling2D
from keras.models import Model

# this is the size of our encoded representations
encoding_dim = 32  # 32 floats -> compression of factor 24.5, assuming the input is 784 floats

import tensorflow as tf
tf.merge_all_summaries = tf.summary.merge_all  # see http://stackoverflow.com/questions/40046619/keras-tensorflow-gives-the-error-no-attribute-control-flow-ops
tf.train.SummaryWriter = tf.summary.FileWriter

# this is our input placeholder
input_img = Input(shape=(1, 28, 28))

dim_ordering = 'th' # see http://stackoverflow.com/questions/39848466/tensorflow-keras-convolution2d-valueerror-filter-must-not-be-larger-than-t/39882814

# "encoded" is the encoded representation of the input
x = Convolution2D(16, 3, 3, activation='relu', border_mode='same', dim_ordering=dim_ordering)(input_img)
x = MaxPooling2D((2, 2), border_mode='same', dim_ordering=dim_ordering)(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same', dim_ordering=dim_ordering)(x)
x = MaxPooling2D((2, 2), border_mode='same', dim_ordering=dim_ordering)(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same', dim_ordering=dim_ordering)(x)
encoded = MaxPooling2D((2, 2), border_mode='same', dim_ordering=dim_ordering)(x)

# at this point the representation is (8, 4, 4) i.e. 128-dimensional

# "decoded" is the lossy reconstruction of the input
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same', dim_ordering=dim_ordering)(encoded)
x = UpSampling2D((2, 2), dim_ordering=dim_ordering)(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same', dim_ordering=dim_ordering)(x)
x = UpSampling2D((2, 2), dim_ordering=dim_ordering)(x)
x = Convolution2D(16, 3, 3, activation='relu', dim_ordering=dim_ordering)(x)
x = UpSampling2D((2, 2), dim_ordering=dim_ordering)(x)
decoded = Convolution2D(1, 3, 3, activation='sigmoid', border_mode='same', dim_ordering=dim_ordering)(x)

# this model maps an input to its reconstruction
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mean_squared_error')

# this model maps an input to its encoded representation
encoder = Model(input=input_img, output=encoded)

# create a placeholder for an encoded (32-dimensional) input
encoded_input = Input(shape=(1, 28, 28))
# retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# create the decoder model
decoder = Model(input=encoded_input, output=decoder_layer(encoded_input))

autoencoder.compile(optimizer='adam', loss='mean_squared_error')

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)
print(x_test.shape)

from keras.callbacks import TensorBoard

autoencoder.fit(x_train, x_train,
                nb_epoch=3,
                batch_size=128,
                shuffle=True,
                validation_data=(x_test, x_test),
                callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])

# encode and decode some digits
# note that we take them from the *test* set
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

# use Matplotlib (don't ask)
import matplotlib.pyplot as plt

n = 10  # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

n = 10
plt.figure(figsize=(20, 8))
for i in range(n):
    ax = plt.subplot(1, n, i+1)
    plt.imshow(encoded_imgs[i].reshape(4, 4 * 8).T)
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

Complete error message

[...] is the path to the anaconda environment

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
[...]/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, debug_python_shape_fn, require_shape_fn)
    669           node_def_str, input_shapes, input_tensors, input_tensors_as_shapes,
--> 670           status)
    671   except errors.InvalidArgumentError as err:

[...]/lib/python3.5/contextlib.py in __exit__(self, type, value, traceback)
     65             try:
---> 66                 next(self.gen)
     67             except StopIteration:

[...]/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status()
    468           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 469           pywrap_tensorflow.TF_GetCode(status))
    470   finally:

InvalidArgumentError: Dimensions must be equal, but are 1 and 16 for 'Conv2D_7' (op: 'Conv2D') with input shapes: [?,28,28,1], [3,3,16,1].

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
 in ()
      4 decoder_layer = autoencoder.layers[-1]
      5 # create the decoder model
----> 6 decoder = Model(input=encoded_input, output=decoder_layer(encoded_input))

[...]/lib/python3.5/site-packages/keras/engine/topology.py in __call__(self, x, mask)
    570         if inbound_layers:
    571             # This will call layer.build() if necessary.
--> 572             self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
    573             # Outputs were already computed when calling self.add_inbound_node.
    574             outputs = self.inbound_nodes[-1].output_tensors

[...]/lib/python3.5/site-packages/keras/engine/topology.py in add_inbound_node(self, inbound_layers, node_indices, tensor_indices)
    633         # creating the node automatically updates self.inbound_nodes
    634         # as well as outbound_nodes on inbound layers.
--> 635         Node.create_node(self, inbound_layers, node_indices, tensor_indices)
    636 
    637     def get_output_shape_for(self, input_shape):

[...]/lib/python3.5/site-packages/keras/engine/topology.py in create_node(cls, outbound_layer, inbound_layers, node_indices, tensor_indices)
    164 
    165         if len(input_tensors) == 1:
--> 166             output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
    167             output_masks = to_list(outbound_layer.compute_mask(input_tensors[0], input_masks[0]))
    168             # TODO: try to auto-infer shape

[...]/lib/python3.5/site-packages/keras/layers/convolutional.py in call(self, x, mask)
    473                           border_mode=self.border_mode,
    474                           dim_ordering=self.dim_ordering,
--> 475                           filter_shape=self.W_shape)
    476         if self.bias:
    477             if self.dim_ordering == 'th':

[...]/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py in conv2d(x, kernel, strides, border_mode, dim_ordering, image_shape, filter_shape, filter_dilation)
   2689     if filter_dilation == (1, 1):
   2690         strides = (1,) + strides + (1,)
-> 2691         x = tf.nn.conv2d(x, kernel, strides, padding=padding)
   2692     else:
   2693         assert filter_dilation[0] == filter_dilation[1]

[...]/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, data_format, name)
    394                                 strides=strides, padding=padding,
    395                                 use_cudnn_on_gpu=use_cudnn_on_gpu,
--> 396                                 data_format=data_format, name=name)
    397   return result
    398 

[...]/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py in apply_op(self, op_type_name, name, **keywords)
    761         op = g.create_op(op_type_name, inputs, output_types, name=scope,
    762                          input_types=input_types, attrs=attr_protos,
--> 763                          op_def=op_def)
    764         if output_structure:
    765           outputs = op.outputs

[...]/lib/python3.5/site-packages/tensorflow/python/framework/ops.py in create_op(self, op_type, inputs, dtypes, input_types, name, attrs, op_def, compute_shapes, compute_device)
   2395                     original_op=self._default_original_op, op_def=op_def)
   2396     if compute_shapes:
-> 2397       set_shapes_for_outputs(ret)
   2398     self._add_op(ret)
   2399     self._record_op_seen_by_control_dependencies(ret)

[...]/lib/python3.5/site-packages/tensorflow/python/framework/ops.py in set_shapes_for_outputs(op)
   1755       shape_func = _call_cpp_shape_fn_and_require_op
   1756 
-> 1757   shapes = shape_func(op)
   1758   if shapes is None:
   1759     raise RuntimeError(

[...]/lib/python3.5/site-packages/tensorflow/python/framework/ops.py in call_with_requiring(op)
   1705 
   1706   def call_with_requiring(op):
-> 1707     return call_cpp_shape_fn(op, require_shape_fn=True)
   1708 
   1709   _call_cpp_shape_fn_and_require_op = call_with_requiring

[...]/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py in call_cpp_shape_fn(op, input_tensors_needed, input_tensors_as_shapes_needed, debug_python_shape_fn, require_shape_fn)
    608     res = _call_cpp_shape_fn_impl(op, input_tensors_needed,
    609                                   input_tensors_as_shapes_needed,
--> 610                                   debug_python_shape_fn, require_shape_fn)
    611     if not isinstance(res, dict):
    612       # Handles the case where _call_cpp_shape_fn_impl calls unknown_shape(op).

[...]/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, debug_python_shape_fn, require_shape_fn)
    673       missing_shape_fn = True
    674     else:
--> 675       raise ValueError(err.message)
    676 
    677   if missing_shape_fn:

ValueError: Dimensions must be equal, but are 1 and 16 for 'Conv2D_7' (op: 'Conv2D') with input shapes: [?,28,28,1], [3,3,16,1].

Nassim Ben · Accepted Answer

Try to set this in all your convolutional, upsampling and maxpooling layers:

Convolution2D(..., dim_ordering = 'th')
UpSampling2D(..., dim_ordering='th')
MaxPooling2D(..., dim_ordering='th')

You can find informations about this feature in the doc.

If this doesn't help do you mind sharing more info about your set up and versions?

**EDIT : **

This is why you should always include executable code in your questions.

Your mistake is that you copy pasted the code from the tutorial in a wrong way.

When they build this decoder :

encoded_input = Input(shape=(1, 28, 28))
# retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# create the decoder model
decoder = Model(input=encoded_input,            output=decoder_layer(encoded_input))

It's made for the previous autoencoder :

# this is our input placeholder
input_img = Input(shape=(784,))
# "encoded" is the encoded representation of the input
encoded = Dense(encoding_dim, activation='relu')(input_img)
# "decoded" is the lossy reconstruction of the input
decoded = Dense(784, activation='sigmoid')(encoded)

Their encoded_input has a shape that doesn't match yours because you are using the convolutionnal autoencoder. So you are trying to match :

encoded = MaxPooling2D((2, 2), border_mode='same', dim_ordering=dim_ordering)(x)
# at this point the representation is (8, 4, 4) i.e. 128-dimensional

an input which has a shape (1,28,28) with something that thas a shape (8,4,4). The shape of the encoded image is different between the dense autoencoder and the convolutionnal autoencoder. Your input has to match the shape of the encoded tensors for the model that you are using.

So first thing to change is this line :

encoded_input = Input(shape=(8,4,4))

Now, what they do here :

decoder_layer = autoencoder.layers[-1]

is to select the last layer of the model they were using (the dense autoencoder). Why do they do that? It's because their decoder is only one layer big... So they take it and apply the input to it, that's their decoder. Now your situation is a bit trickier, you want to use all the 'decoder-side' layers of your convolutionnal autoencoder. This is how you should do it :

 # Retrieve and apply the encoded input to the first layer of the decoder 
 # it corresponds to Convolution2D(8, 3, 3,...) part of your model if you count the layers.
 decoder_layer = autoencoder.layers[-7](encoded_input)
 # loop over the next layers of the decoder until the last one
 for i in range(-6,0):
     decoder_layer = autoencoder.layers[i](decoder_layer)
 # create the decoder model
 decoder = Model(input=encoded_input, output=decoder_layer)

Is it clearer? You cannot copy/paste code and build up complex stuff with partial code without knowing what those part of code mean.