Neural Network with images as input and single pixels output

Question

I'm trying to train a NN that should return a single pixel image having as input 90x90 images. Dataset consists of 1000 RGB images in a numpy array, so shape is (1000, 90, 90, 3) for input, and the target output contains 1000 RGB images made by the brightest pixel in each image, so the shape is (1000, 1, 1, 3).

The fitting of the model seems to work good, with reasonable loss values, but when I run prediction I get an output with 90x90 images.

I've tryed different kind of models using Conv2D layers, Dense layer, MaxPooling2D and mixing them in different way with different parameters, but I can't get a 1x1 output.

EDIT I've tryed the way BestDogeStackoverflow suggested, and I think it works. Here the network:

def create_model():
    x = Input(shape=(64,64,3))# Encoder
    pool1 = MaxPooling2D((2, 2), padding='same')(x)
    pool2 = MaxPooling2D((2, 2), padding='same')(pool1)    
    pool3 = MaxPooling2D((2, 2), padding='same')(pool2)    
    pool4 = MaxPooling2D((2, 2), padding='same')(pool3)    
    pool5 = MaxPooling2D((2, 2), padding='same')(pool4)
    pool6 = MaxPooling2D((2, 2), padding='same')(pool5)   
    r = Conv2D(3, (1, 1), activation='linear', padding='same')(pool6)  
    model = Model(x, r)
    model.compile(optimizer='adam', loss='mse')
    return model
    
model = create_model()

model.compile(optimizer='adam', loss=losses.MeanSquaredError())

model.fit(X_train, y_train,
                batch_size=10,
                epochs=20,
                shuffle=True,
                validation_data=(X_test, y_test))

Now the prediction shape is what expected:

predict = encoder.predict(
    X_test, batch_size=None, verbose=0, steps=None, callbacks=None, max_queue_size=10,
    workers=1, use_multiprocessing=False)

print (predict.shape)

(330, 1, 1, 3)

Is it ok that model.summary() returns this?

Model: "model_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________

BestDogeStackoverflow · Accepted Answer

If you want an image of 1x1 as an output (that really sound like something nobody would want to do in ANY circumstances) you need to use a fully convolutional neural network with layers that reduce the image dimension, you need to use layers like traspose convolutions and maxpooling to reduce the dimension to 1, but 90x90 wont work, you need to resize your image to have dimension as a power of 2, (so something like 128x128 or 64x64), then use enough maxpooling until you reach the 1x1 dimension for the output.

Neural Network with images as input and single pixels output

Answers (1)

Related Questions