How to Flatten data of arbitrary input shape?

Question

I'm building a CNN with Keras that predicts the coordinates of 13 keypoints in every image. The images I have vary in input dimension so my input layer shape is (None, None, 3). I am using Inception Modules so I am using the Functional API. Now, while coding the last layers for my model, I encountered a problem. As far as I know, my output layer wil be a Dense(26) layer, since I will encode the x and y coordinates as a vector. I have trouble connecting the output layer with the preceeding Convolutional layers (because of tensor dimensions)

x = Input(None, None, 3)
stage_1 = Conv2D(26, (1, 1))(x)
stage_1 = Dropout(0.3)(stage_1)
stage_2 = Conv2D(512, (1, 1))(x)
stage_2 = Dropout(0.3)(stage_2)
stage_2 = Activation('relu')(stage_2)
x = concatenate([stage_1, stage_2])
x = Lambda(lambda i: K.batch_flatten(i))(x)
outputs = Dense(26)(x)

I tried including a Flatten Layer (but it is not compatible with arbitrary input shapes) and I've tried using K.batch_flatten() in a Lambda layer (which also did not work.) My question is: Is there a different way to get an output layer in a similar shape ((13,2) would also be fine, I just only found models online where the output layer is a Dense layer)? I also tried GlobalAveragePooling2d(), but this greatly decreased the accuracy of the model. Also, using a function to find the output shape did not work, see below

stage_1 = Conv2D(26, (1, 1))(x)
stage_1 = Dropout(0.3)(stage_1)
stage_2 = Conv2D(512, (1, 1))(x)
stage_2 = Dropout(0.3)(stage_2)
stage_2 = Activation('relu')(stage_2)
x = concatenate([stage_1, stage_2])

def output_shape_batch(tensor_shape):
    print(tensor_shape)
    return (batch_size, tensor_shape[1] * tensor_shape[2] * tensor_shape[3])

x = Lambda(lambda i: K.batch_flatten(i), output_shape=output_shape_batch)(x)
outputs = Dense(26)(x)

I expect the model to compile, but get TypeErrors The error is: TypeError: unsupported operand type(s) for *: 'NoneType' and 'NoneType'

markuscosinus · Accepted Answer

To the best of my knowlegde what you ask for is sadly not possible. I'll first try to explain why and then give you some options for what you could do instead.

A neural network usually expects a fixed size input. Since every value of that input will be connected to a weight, the size of the input is needed for the calculation of the number of weights when initializing the model. Inputs of varying size are generally not possible, because this would change the number of weights and it is unclear what weights to choose/how to train them in this case.
Convolutional Layers are an exception for this. They use a fixed size kernel, thus the number of weights does not depend on the input size, which is why keras supports these 'variable size' inputs. However the input size of a convolutional layer changes its output size. This is not a problem if the next layer is also a convolotional layer, but when a dense layer is added the input size has to be fixed. Usually a Global Pooling layer is used to reduce a variable sized output to a fixed size. Then the dense layer can be added without a problem.
Since you want to predict coordinates in the image, global averaging will not be a good choice for you, because it destroys all the positional information. So here are two alternatives that you can consider:

You could rescale all your images to the same size during preprocessing.
You could choose a maximum size for your input images and the add (zero) padding to your images to make them all the same size.

How to Flatten data of arbitrary input shape?

Answers (1)

Related Questions