user7039446
user7039446

Reputation: 1

the weight of encoder do not change when training autoencoder using tensorflow

The coder implementing autoencoder is shown as following:

# One Layer Autoencoder
# Parameters
learning_rate = 0.01
training_epochs = 20
batch_size = 256
display_step = 1
examples_to_show = 10

# Network Parameters
n_hidden= 128 # 1st layer num features
n_input = 784 # MNIST data input (img shape: 28*28)

# tf Graph input (only pictures)
X = tf.placeholder("float", [None, n_input])

weights = {
    'encoder_h': tf.Variable(tf.random_normal([n_input, n_hidden])),
    'decoder_h': tf.Variable(tf.random_normal([n_hidden, n_input])),
}
biases = {
    'encoder_b': tf.Variable(tf.random_normal([n_hidden])),
    'decoder_b': tf.Variable(tf.random_normal([n_input])),
}

# Building the encoder
hidden_layer =             tf.nn.sigmoid(tf.add(tf.matmul(X,weights["encoder_h"]),biases["encoder_b"]))
out_layer = tf.nn.sigmoid(tf.add(tf.matmul(hidden_layer,weights["decoder_h"]),biases["decoder_b"]))


# Prediction
y_pred = out_layer
# Targets (Labels) are the input data.
y_true = X

# Define loss and optimizer, minimize the squared error
cost = tf.reduce_mean(tf.pow(y_true - y_pred, 2))
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)
# initializing the variables
init = tf.initialize_all_variables()
with tf.device("/gpu:0"):
    with tf.Session(config=config) as sess:
        sess.run(init)
        total_batch = int(mnist.train.num_examples/batch_size)
        print([total_batch,batch_size,mnist.train.num_examples])
        for epoch in range(training_epochs):#each round
            for i in range(total_batch):
                batch_xs, batch_ys = mnist.train.next_batch(batch_size)
                # Run optimization op (backprop) and cost op (to get loss value)
                _, loss_c = sess.run([optimizer, cost], feed_dict={X: batch_xs})

            if epoch % display_step == 0:
                encoder_w = weights["encoder_h"]
                encoder_w_eval = encoder_w.eval()
                print(encoder_w_eval[0,0])
                decoder_w = weights["decoder_h"]
                decoder_w_eval = decoder_w.eval()
                print(decoder_w_eval[0,0])
                print("Epoch:","%04d"%(epoch+1),
                 "cost=","{:.9f}".format(loss_c))
        print("Optimization Finished!")

When I print the encoder, decoder weight and the loss. The decoder and loss weight changes when training but the encoder weight remain the same as shown as following and I don't know why. Somebody help.

encoder_w -0.00818192

decoder_w -1.48731

Epoch: 0001 cost= 0.132702485

encoder_w -0.00818192

decoder_w -1.4931

Epoch: 0002 cost= 0.089116640

encoder_w -0.00818192

decoder_w -1.49607

Epoch: 0003 cost= 0.080637991

encoder_w -0.00818192

decoder_w -1.49947

Epoch: 0004 cost= 0.073829792

encoder_w -0.00818192

decoder_w -1.50176

...

Upvotes: 0

Views: 622

Answers (2)

I. A
I. A

Reputation: 2312

The weights always behave in that manner. That is, they always have gaussian distribution. Note that your input could follow any distribution in high dimensions. In addition, it seems that if you different types of distribution you end up having a Gaussian distribution (This is from probability theory). As a result, the distribution of the weights will somehow generalize and will continue following a Gaussian distribution. Note also that, with Batch Normalization the aim is to force the output of the activation functions to follow a Gaussian distribution. This is the general intuition.

In addition, l2_regulization is in some way forcing the weights to follow a Gaussian Distribution.

Finally, printing the weights as you are doing is wrong. Rather you should use Tensorboard.

Hope this answer helps.

Upvotes: 1

Alexandre Passos
Alexandre Passos

Reputation: 5206

In general I recommend inspecting the graph on Tensorboard to make sure that it looks like you expect it to (for example, that there are gradient updates for the encoder weights.

In your case it could be that encoder_w[0, 0] doesn't change much because the its gradients happen to be small, and so is the learning rate.

Upvotes: 0

Related Questions