TensorFlow DCGAN model: stability and convergence problems

Question

I have built an own implementation for a DCGAN in TensorFlow training on MNIST.

Full code (runnable) is available at github: https://github.com/Daniel451/tfdcgan

Feel free to submit pull requests :)

While the model tries to learn to generate MNIST samples its stability is very poor and the convergence slow (even after 10 epochs the generated samples still look very artificial).

Interestingly, I have implemented the very same model in Keras (using TensorFlow backend) in the first place and this works as expected. It learns reasonable filters and the Generator returns nice MNIST samples when fed with a standard normal distribution.

I suspect that it is a problem with the loss functions or model configuration, but I were unable to find the exact problem.

Another strange thing I've noticed is that the TensorFlow implementation needs the Discriminator's output to have a shape of (batch_size, 2). Thus, I'm encoding generator/fake images with [0, 1] for training and real training images with [1, 0].

My expectation was that only tf.nn.sparse_softmax_cross_entropy_with_logits should need this shape, since it needs sparse labels to work...but even tf.nn.softmax_cross_entropy_with_logits and tf.nn.sigmoid_cross_entropy_with_logits do not return useful loss calculations when the Discriminator has a shape of (batch_size, 1), where the encoding is simply 0.0 for generator/fake images and 1.0 for real training images.

The Keras implementation works well with different loss functions and a single output neuron for the Discriminator.

This is the Generator's (G) model:

def model_generator(self, Z, reuse=True):

        init_op = tf.contrib.layers.xavier_initializer(uniform=True, dtype=tf.float32)

        with tf.variable_scope("g", initializer=init_op, reuse=reuse, dtype=tf.float32):

            with tf.variable_scope("reshape"):
                out = tf.layers.dense(Z, 7 * 7 * 256, activation=None)
                out = tf.reshape(out, [-1, 7, 7, 256])
                out = tf.layers.batch_normalization(out)
                out = tf.nn.tanh(out)

            with tf.variable_scope("deconv1"):
                out = tf.layers.conv2d_transpose(out, 128, [3, 3], strides=[2, 2], padding="same")
                out = tf.layers.batch_normalization(out)
                out = tf.nn.tanh(out)

            with tf.variable_scope("deconv2"):
                out = tf.layers.conv2d_transpose(out, 64, [3, 3], strides=[2, 2], padding="same")
                out = tf.layers.batch_normalization(out)
                out = tf.nn.tanh(out)

            with tf.variable_scope("output"):
                out = tf.layers.conv2d_transpose(out, 1, [5, 5], strides=[1, 1], padding="same")
                logits = out
                output = tf.nn.tanh(out)

        return output, logits

...and this is the Discriminator's (D) model:

def model_discriminator(self, X, reuse=True, trainable=True):

        init_op = tf.contrib.layers.xavier_initializer(uniform=False, dtype=tf.float32)

        with tf.variable_scope("d", initializer=init_op, reuse=reuse, dtype=tf.float32):

            with tf.variable_scope("conv1"):
                out = tf.layers.conv2d(X, 64, [5, 5], strides=[2, 2], padding="same",
                                       trainable=trainable)
                out = tf.nn.tanh(out)

            with tf.variable_scope("conv2"):
                out = tf.layers.conv2d(out, 128, [3, 3], strides=[2, 2], padding="same",
                                       trainable=trainable)
                out = tf.nn.tanh(out)

            with tf.variable_scope("output"):
                out = tf.reshape(out, [-1, 7 * 7 * 128])
                out = tf.layers.dense(out, 2, activation=None, trainable=trainable)
                logits = out
                output = tf.sigmoid(out)

        return output, logits

I have tried each of these loss functions:

self.d_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.D_logits, labels=self.Y))
self.dg_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.DG_logits, labels=self.Y))

self.d_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=self.D_logits, labels=self.Y))
self.dg_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=self.DG_logits, labels=self.Y))

self.d_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=self.D_logits, labels=self.Y))
self.dg_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=self.DG_logits, labels=self.Y))

Here are the corresponding training operations:

self.d_train_op = tf.train.AdamOptimizer(learning_rate=2e-4, beta1=0.5, beta2=0.999, name="Adam_D")\
            .minimize(self.d_loss, var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="d"))
self.g_train_op = tf.train.AdamOptimizer(learning_rate=2e-4, beta1=0.5, beta2=0.999, name="Adam_DG")\
            .minimize(self.dg_loss, var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="g"))

...beta1=0.5 is suggested by the linked paper and var_list=... ensures that either D or G is trained, but never both.

I have normalized the MNIST input images to the interval [-1.0, 1.0], like suggested by several sources.

The instanstiation of self.G (Generator; for predictions), self.D (Discriminator; for classification) and self.DG (Generator+Discrimnator, in order train the Generator) looks like this:

# placeholder for noise Z, fed into G
self.Z = tf.placeholder(tf.float32, shape=[None, 100], name="Z")
# placeholder for X, image data fed into D
self.X = tf.placeholder(tf.float32, shape=[None, 28, 28, 1], name="X")
# placeholder for Y, labels for training
self.Y = tf.placeholder(tf.int32, shape=[None], name="Y")

self.G, self.G_logits = self.model_generator(self.Z, reuse=False)
self.D, self.D_logits = self.model_discriminator(self.X, reuse=False)
self.DG, self.DG_logits = self.model_discriminator(self.G, trainable=False)

I am training the DCGAN in 3 steps per batch:

train D with real images
train D with generator/fake images
train G

Any ideas on why this net performs badly?

TensorFlow DCGAN model: stability and convergence problems

Answers (1)

Related Questions