daniel451
daniel451

Reputation: 11002

TensorFlow DCGAN model: stability and convergence problems

I have built an own implementation for a DCGAN in TensorFlow training on MNIST.

Full code (runnable) is available at github: https://github.com/Daniel451/tfdcgan

Feel free to submit pull requests :)

While the model tries to learn to generate MNIST samples its stability is very poor and the convergence slow (even after 10 epochs the generated samples still look very artificial).

Interestingly, I have implemented the very same model in Keras (using TensorFlow backend) in the first place and this works as expected. It learns reasonable filters and the Generator returns nice MNIST samples when fed with a standard normal distribution.

I suspect that it is a problem with the loss functions or model configuration, but I were unable to find the exact problem.

Another strange thing I've noticed is that the TensorFlow implementation needs the Discriminator's output to have a shape of (batch_size, 2). Thus, I'm encoding generator/fake images with [0, 1] for training and real training images with [1, 0].

My expectation was that only tf.nn.sparse_softmax_cross_entropy_with_logits should need this shape, since it needs sparse labels to work...but even tf.nn.softmax_cross_entropy_with_logits and tf.nn.sigmoid_cross_entropy_with_logits do not return useful loss calculations when the Discriminator has a shape of (batch_size, 1), where the encoding is simply 0.0 for generator/fake images and 1.0 for real training images.

The Keras implementation works well with different loss functions and a single output neuron for the Discriminator.

This is the Generator's (G) model:

def model_generator(self, Z, reuse=True):

        init_op = tf.contrib.layers.xavier_initializer(uniform=True, dtype=tf.float32)

        with tf.variable_scope("g", initializer=init_op, reuse=reuse, dtype=tf.float32):

            with tf.variable_scope("reshape"):
                out = tf.layers.dense(Z, 7 * 7 * 256, activation=None)
                out = tf.reshape(out, [-1, 7, 7, 256])
                out = tf.layers.batch_normalization(out)
                out = tf.nn.tanh(out)

            with tf.variable_scope("deconv1"):
                out = tf.layers.conv2d_transpose(out, 128, [3, 3], strides=[2, 2], padding="same")
                out = tf.layers.batch_normalization(out)
                out = tf.nn.tanh(out)

            with tf.variable_scope("deconv2"):
                out = tf.layers.conv2d_transpose(out, 64, [3, 3], strides=[2, 2], padding="same")
                out = tf.layers.batch_normalization(out)
                out = tf.nn.tanh(out)

            with tf.variable_scope("output"):
                out = tf.layers.conv2d_transpose(out, 1, [5, 5], strides=[1, 1], padding="same")
                logits = out
                output = tf.nn.tanh(out)

        return output, logits

...and this is the Discriminator's (D) model:

def model_discriminator(self, X, reuse=True, trainable=True):

        init_op = tf.contrib.layers.xavier_initializer(uniform=False, dtype=tf.float32)

        with tf.variable_scope("d", initializer=init_op, reuse=reuse, dtype=tf.float32):

            with tf.variable_scope("conv1"):
                out = tf.layers.conv2d(X, 64, [5, 5], strides=[2, 2], padding="same",
                                       trainable=trainable)
                out = tf.nn.tanh(out)

            with tf.variable_scope("conv2"):
                out = tf.layers.conv2d(out, 128, [3, 3], strides=[2, 2], padding="same",
                                       trainable=trainable)
                out = tf.nn.tanh(out)

            with tf.variable_scope("output"):
                out = tf.reshape(out, [-1, 7 * 7 * 128])
                out = tf.layers.dense(out, 2, activation=None, trainable=trainable)
                logits = out
                output = tf.sigmoid(out)

        return output, logits

I have tried each of these loss functions:

self.d_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.D_logits, labels=self.Y))
self.dg_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.DG_logits, labels=self.Y))

self.d_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=self.D_logits, labels=self.Y))
self.dg_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=self.DG_logits, labels=self.Y))

self.d_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=self.D_logits, labels=self.Y))
self.dg_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=self.DG_logits, labels=self.Y))

Here are the corresponding training operations:

self.d_train_op = tf.train.AdamOptimizer(learning_rate=2e-4, beta1=0.5, beta2=0.999, name="Adam_D")\
            .minimize(self.d_loss, var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="d"))
self.g_train_op = tf.train.AdamOptimizer(learning_rate=2e-4, beta1=0.5, beta2=0.999, name="Adam_DG")\
            .minimize(self.dg_loss, var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="g"))

...beta1=0.5 is suggested by the linked paper and var_list=... ensures that either D or G is trained, but never both.

I have normalized the MNIST input images to the interval [-1.0, 1.0], like suggested by several sources.

The instanstiation of self.G (Generator; for predictions), self.D (Discriminator; for classification) and self.DG (Generator+Discrimnator, in order train the Generator) looks like this:

# placeholder for noise Z, fed into G
self.Z = tf.placeholder(tf.float32, shape=[None, 100], name="Z")
# placeholder for X, image data fed into D
self.X = tf.placeholder(tf.float32, shape=[None, 28, 28, 1], name="X")
# placeholder for Y, labels for training
self.Y = tf.placeholder(tf.int32, shape=[None], name="Y")

self.G, self.G_logits = self.model_generator(self.Z, reuse=False)
self.D, self.D_logits = self.model_discriminator(self.X, reuse=False)
self.DG, self.DG_logits = self.model_discriminator(self.G, trainable=False)

I am training the DCGAN in 3 steps per batch:

  1. train D with real images
  2. train D with generator/fake images
  3. train G

Any ideas on why this net performs badly?

Upvotes: 1

Views: 846

Answers (1)

David Haldimann
David Haldimann

Reputation: 1

One thing I noticed from your problem is that you mention that you train in 3 steps. Usually one trains in two or one step. Either you train the discriminator and generator separately (2 steps) or together (one step).

When training the discriminator you use get the output of the discriminator for the real and for a fake sample and generate the loss for both and apply the gradient.

If you train in one step, you need to make sure that the gradients are applied in the correct order.

Upvotes: 0

Related Questions