Reputation: 21
I am trying to implement a variational autoencoder using python and tensorflow. I have seen various implementations on the internet. I have managed to create my own using the various parts that I found and made them work with my specific case. I have concluded with an autoencoder here: my autoncoder on git
Briefly I have an autoencoder that contains:
1) an encoder with 2 convolutional layers and 1 flatten layer,
2) the latent space ( of dimension 2),
3) and a decoder with the reverse parts of the encoder.
My problem is when I try to implement the variational part of the autoencoder. By that I mean the math procedure in the latent space. Atleast that is where I pinpoint the problem.
To be more clear I have the following 2 cases:
Case 1: Without actually implementing any variational maths, just simply set the variables in the latent space and feed them in the decoder with no math applied.In that case the cost function is just the difference between input and output. You can see the code for that case in these figures on the git(sorry cannot post more links): figure1_code_part1.png, figure1_code_part2.png
Case2: Trying to implement the maths in the latent space variables. You can see the code for that case in these figures: figure_2_code_part1.png, figure_2_code_part2.png
The plot of the latent space I get in each of the cases is: figure_1.png figure_2.png
I think something is clearly wrong with the variational implementation, but I can't figure out what. Everyone who implements variational auto encoder uses these mathematic formulas (at least the ones I found on the internet). Probably I am missing something.
Any comments/suggestions are welcome. Thanks!!!
Upvotes: 1
Views: 914
Reputation: 2312
Here is how the mu
and sigma
with the KL_term
to be calculated:
I am not sure about the linear
part of your code. Hence, I suggested the following:
Please note that here, before the fully connected layers at the encoder side, I have a conv4
layer of shape: [7, 7, 256]
.
# These are the weights and biases of the mu and sigma on the encoder side
w_c_mu = tf.Variable(tf.truncated_normal([7 * 7 * 256, latent_dim], stddev=0.1), name='weight_fc_mu')
b_c_mu = tf.Variable(tf.constant(0.1, shape=[latent_dim]), name='biases_fc_mu')
w_c_sig = tf.Variable(tf.truncated_normal([7 * 7 * 256, latent_dim], stddev=0.1), name='weight_fc_sig')
b_c_sig = tf.Variable(tf.constant(0.1, shape=[latent_dim]), name='biases_fc_sig')
epsilon = tf.random_normal([1, latent_dim])
with tf.variable_scope('mu'):
mu = tf.nn.bias_add(tf.matmul(conv4_reshaped, w_c_mu), b_c_mu)
tf.summary.histogram('mu', mu)
with tf.variable_scope('stddev'):
stddev = tf.nn.bias_add(tf.matmul(conv4_reshaped, w_c_sig), b_c_sig)
tf.summary.histogram('stddev', stddev)
with tf.variable_scope('z'):
# This formula was adopted from the following paper: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7979344
latent_var = mu + tf.multiply(tf.sqrt(tf.exp(stddev)), epsilon)
tf.summary.histogram('features_sig', stddev)
...
with tf.name_scope('loss_KL'):
temp2 = 1 + tf.log(tf.square(stddev + 1e-9)) - tf.square(mu) - tf.square(stddev)
KL_term = - 0.5 * tf.reduce_sum(temp2, reduction_indices=1)
tf.summary.scalar('KL_term', tf.reduce_mean(KL_term))
with tf.name_scope('total_loss'):
variational_lower_bound = tf.reduce_mean(log_likelihood + KL_term)
tf.summary.scalar('loss', variational_lower_bound)
with tf.name_scope('optimizer'):
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(0.00001).minimize(variational_lower_bound)
For the full code: https://gist.github.com/issa-s-ayoub/5267558c4f5694d479a84d960c265452
Wish that helps!!
Upvotes: 2