Fine tuning deep autoencoder model for mnist

Question

I have developed a 3 layer deep autoencoder model for the mnist dataset as I am just practicing on this toy dataset as I am beginner in this fine-tuning paradigm

Following is the code

from keras import  layers
from keras.layers import Input, Dense
from keras.models import Model,Sequential
from keras.datasets import mnist
import numpy as np

# Deep Autoencoder


# this is the size of our encoded representations
encoding_dim = 32   # 32 floats -> compression factor 24.5, assuming the input is 784 floats

# this is our input placeholder; 784 = 28 x 28
input_img = Input(shape=(784, ))

my_epochs = 100

# "encoded" is the encoded representation of the inputs
encoded = Dense(encoding_dim * 4, activation='relu')(input_img)
encoded = Dense(encoding_dim * 2, activation='relu')(encoded)
encoded = Dense(encoding_dim, activation='relu')(encoded)

# "decoded" is the lossy reconstruction of the input
decoded = Dense(encoding_dim * 2, activation='relu')(encoded)
decoded = Dense(encoding_dim * 4, activation='relu')(decoded)
decoded = Dense(784, activation='sigmoid')(decoded)

# this model maps an input to its reconstruction
autoencoder = Model(input_img, decoded)

# Separate Encoder model

# this model maps an input to its encoded representation
encoder = Model(input_img, encoded)

# Separate Decoder model

# create a placeholder for an encoded (32-dimensional) input
encoded_input = Input(shape=(encoding_dim, ))
# retrieve the layers of the autoencoder model
decoder_layer1 = autoencoder.layers[-3]
decoder_layer2 = autoencoder.layers[-2]
decoder_layer3 = autoencoder.layers[-1]
# create the decoder model
decoder = Model(encoded_input, decoder_layer3(decoder_layer2(decoder_layer1(encoded_input))))

# Train to reconstruct MNIST digits

# configure model to use a per-pixel binary crossentropy loss, and the Adadelta optimizer
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

# prepare input data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# normalize all values between 0 and 1 and flatten the 28x28 images into vectors of size 784
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

# Train autoencoder for 50 epochs

autoencoder.fit(x_train, x_train, epochs=my_epochs, batch_size=256, shuffle=True, validation_data=(x_test, x_test),
                verbose=2)

# after 100 epochs the autoencoder seems to reach a stable train/test lost value

# Visualize the reconstructed encoded representations

# encode and decode some digits
# note that we take them from the *test* set
encodedTrainImages=encoder.predict(x_train)
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)





# From here I want to fine tune just the encoder model
model=Sequential()
model=Sequential()
for layer in encoder.layers:
  model.add(layer)
model.add(layers.Flatten())
model.add(layers.Dense(20, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation='softmax'))

Following is my encoder model which I want to fine-tune.

encoder.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                2080      
=================================================================
Total params: 110,816
Trainable params: 110,816
Non-trainable params: 0
_________________________________________________________________

Problem:1

After building the autoencoder model I want to just use the encoder model and fine tune it for classification task in mnist dataset but I am getting errors.

Error:

Traceback (most recent call last):
  File "C:\Users\samer\Anaconda3\envs	ensorflow-gpu\lib\site-packages\IPython\core\interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "", line 3, in 
    model.add(layers.Flatten())
  File "C:\Users\samer\Anaconda3\envs	ensorflow-gpu\lib\site-packages\keras\engine\sequential.py", line 181, in add
    output_tensor = layer(self.outputs[0])
  File "C:\Users\samer\Anaconda3\envs	ensorflow-gpu\lib\site-packages\keras\engine\base_layer.py", line 414, in __call__
    self.assert_input_compatibility(inputs)
  File "C:\Users\samer\Anaconda3\envs	ensorflow-gpu\lib\site-packages\keras\engine\base_layer.py", line 327, in assert_input_compatibility
    str(K.ndim(x)))
ValueError: Input 0 is incompatible with layer flatten_4: expected min_ndim=3, found ndim=2

Problem 2:

Similarly I would later use pre-trained model where each autoencoder would be trained in a greedy manner and then the final model would be fine tuned. Can somebody just guide me how to proceed further in my these two tasks.

regards

DLM · Accepted Answer

Problem 1

The problem is that you are trying to flatten a layer that is already flat: you encoder is made up of one-dimensional Desnse layers, which have shape (batch_size, dim).

The Flatten layer is expecting at least a 2D input, i.e. having a 3 dimensional shape (batch_size, dim1, dim2) (e.g. the output of a Conv2D layer), by removing it the model will build properly:

encoding_dim = 32
input_img = layers.Input(shape=(784, ))

encoded = layers.Dense(encoding_dim * 4, activation='relu')(input_img)
encoded = layers.Dense(encoding_dim * 2, activation='relu')(encoded)
encoded = layers.Dense(encoding_dim, activation='relu')(encoded)

encoder = Model(input_img, encoded)

[...]

model = Sequential()
for layer in encoder.layers:
    print(layer.name)
    model.add(layer)
model.add(layers.Dense(20, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation='softmax'))

model.summary()

Which ouputs:

input_1
dense_1
dense_2
dense_3
Model: "sequential_1"
________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_4 (Dense)              (None, 20)                660       
_________________________________________________________________
dropout_1 (Dropout)          (None, 20)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                210       
=================================================================
Total params: 111,686
Trainable params: 111,686
Non-trainable params: 0
_________________________________________________________________

___

Edit: integrating answers to questions in the comments

Q: How can I be sure that the new model will be using the same weights as the previously trained encoder?

A: In your code, what you are doing is iterating through the layers contained inside of the encoder, then passing each of them to model.add(). What you are doing here is passing the reference to each layer directly, therefore you will have the very same layer inside your new model. Here is a proof of concept using the layer name:

encoding_dim = 32

input_img = Input(shape=(784, ))

encoded = Dense(encoding_dim * 4, activation='relu')(input_img)
encoded = Dense(encoding_dim * 2, activation='relu')(encoded)

encoded = Dense(encoding_dim, activation='relu')(encoded)

decoded = Dense(encoding_dim * 2, activation='relu')(encoded)
decoded = Dense(encoding_dim * 4, activation='relu')(decoded)
decoded = Dense(784, activation='sigmoid')(decoded)

autoencoder = Model(input_img, decoded)

print("autoencoder first Dense layer reference:", autoencoder.layers[1])

encoder = Model(input_img, encoded)

print("encoder first Dense layer reference:", encoder.layers[1])

new_model = Sequential()
for i, layer in enumerate(encoder.layers):
  print("Before: ", layer.name)
  new_model.add(layer)
  if i != 0:
    new_model.layers[i-1].name = "new_model_"+layer.name
    print("After: ", layer.name)

Which outputs:

autoencoder first Dense layer reference: 
encoder first Dense layer reference: 
Before:  input_1
Before:  dense_1
After:  new_model_dense_1
Before:  dense_2
After:  new_model_dense_2
Before:  dense_3
After:  new_model_dense_3

As you can see, the layer references in the encoder and in the autoencoder are the same. Whatsmore, by changing the layer name inside of the new model we are also changing the layer name inside of the encoder's corresponding layer. For more details on python arguments being passed by reference, check out this answer.

Q: should I need one-hot encoding for my data? if so, then how?

A: You do need a one-hot encoding since you are dealing with a multi-label categorical problem. The encoding is simply done by using a handy keras function:

from keras.utils import np_utils

one_hot = np_utils.to_categorical(y_train)

Here's a link to the documentation.

___

Problem 2

Regarding your second question, it is not very clear what you're aiming to, however what seems to me is that you want to build an architecture which contains several parallel auto-encoders which are specialized on different tasks and then concatenate their output by adding some final, common layers.

In any case, so far what I can do is suggesting you to take a look into this guide, which explains how to build multi-input and multi-output models and use it as a baseline to start with your custom implementation.

___

Edit 2: Problem 2 answer integration

Regarding the greedy training task, the approach is to train one layer at a time by freezing all the previous one as you append new ones. Here's an example for a 3(+1) greedy-trained-layers network, which is later used as a base for a new model:

(x_train, y_train), (x_test, y_test) = mnist.load_data()
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
x_train = np.reshape(x_train, (x_train.shape[0], -1))
x_test = np.reshape(x_test, (x_test.shape[0], -1))

model = Sequential()
model.add(Dense(256, activation="relu", kernel_initializer="he_uniform", input_shape=(28*28,)))
model.add(Dense(10, activation="softmax"))

model.compile(optimizer=SGD(lr=0.01, momentum=0.9), loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=64, epochs=50, verbose=1)

# Remove last layer
model.pop()

# 'Freeze' previous layers, so to single-train the new one
for layer in model.layers:
    layer.trainable = False

# Append new layer + classification layer
model.add(Dense(64, activation="relu", kernel_initializer="he_uniform"))
model.add(Dense(10, activation="softmax"))

model.fit(x_train, y_train, batch_size=64, epochs=50, verbose=0)

#  Remove last layer
model.pop()

# 'Freeze' previous layers, so to single-train the new one
for layer in model.layers:
    layer.trainable = False

# Append new layer + classification layer
model.add(Dense(32, activation="relu", kernel_initializer="he_uniform"))
model.add(Dense(10, activation="softmax"))

model.fit(x_train, y_train, batch_size=64, epochs=50, verbose=0)

# Create new model which will use the pre-trained layers
new_model = Sequential()

# Discard the last layer from the previous model
model.pop()

# Optional: you can decide to set the pre-trained layers as trainable, in 
# which case it would be like having initialized their weights, or not.
for l in model.layers:
    l.trainable = True
new_model.add(model)

new_model.add(Dense(20, activation='relu'))
new_model.add(Dropout(0.5))
new_model.add(Dense(10, activation='softmax'))

new_model.compile(optimizer=SGD(lr=0.01, momentum=0.9), loss="categorical_crossentropy", metrics=["accuracy"])
new_model.fit(x_train, y_train, batch_size=64, epochs=100, verbose=1)

This is roughly it, however I must say that greedy layer training may not be a proper solution anymore: nowadays ReLU, Dropout and other regularization techniques which make the greedy layer training an obsolete and time consuming weight initialization, therefore you might want to take a look at other possibilities as well before going for greedy training.

___

Fine tuning deep autoencoder model for mnist

Answers (1)

Problem 1

Edit: integrating answers to questions in the comments

Problem 2

Edit 2: Problem 2 answer integration

Related Questions