GAN not converging. Discriminator loss keeps increasing

Question

I am making a simple generative adverserial network on mnist dataset.

This is my implementation :

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)

def noise(batch_size):
    return np.random.uniform(-1, 1, (batch_size, 100))

learning_rate = 0.001
batch_size = 128

input = tf.placeholder('float', [None, 100])
real_data = tf.placeholder('float', [None, 784])

def generator(x):
    weights = {
        'hl1' : tf.Variable(tf.random_normal([100, 200])),
        'ol'  : tf.Variable(tf.random_normal([200, 784]))
    }
    biases = {
        'hl1' : tf.Variable(tf.random_normal([200])),
        'ol'  : tf.Variable(tf.random_normal([784]))
    }

    hl1 = tf.add(tf.matmul(x, weights['hl1']), biases['hl1'])
    ol = tf.nn.sigmoid(tf.add(tf.matmul(hl1, weights['ol']), biases['ol']))

    return ol


def discriminator(x):
    weights = {
        'hl1' : tf.Variable(tf.random_normal([784, 200])),
        'ol'  : tf.Variable(tf.random_normal([200, 1]))
    }
    biases = {
        'hl1' : tf.Variable(tf.random_normal([200])),
        'ol'  : tf.Variable(tf.random_normal([1]))
    }

    hl1 = tf.add(tf.matmul(x, weights['hl1']), biases['hl1'])
    ol = tf.nn.sigmoid(tf.add(tf.matmul(hl1, weights['ol']), biases['ol']))

    return ol

with tf.variable_scope("G"):
    G = generator(input)

with tf.variable_scope("D"):
    D_real = discriminator(real_data)

with tf.variable_scope("D", reuse = True):
    D_gen = discriminator(G)

generator_parameters = [x for x in tf.trainable_variables() if x.name.startswith('G/')]
discriminator_parameters = [x for x in tf.trainable_variables() if x.name.startswith('D/')]

G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_gen, labels=tf.ones_like(D_gen)))
D_real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_real, labels=tf.ones_like(D_real)))
D_fake_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_gen, labels=tf.zeros_like(D_gen)))
D_total_loss = tf.add(D_fake_loss, D_real_loss)

G_train = tf.train.AdamOptimizer(learning_rate).minimize(G_loss,var_list=generator_parameters)
D_train = tf.train.AdamOptimizer(learning_rate).minimize(D_total_loss,var_list=discriminator_parameters)

sess = tf.Session()
init = tf.global_variables_initializer()

sess.run(init)

loss_g_function = []
loss_d_function = []

for epoch in range(200):
    for iteratiion in range(int(len(mnist.train.images)/batch_size)):
        real_batch, _ = mnist.train.next_batch(batch_size)

        _, d_err = sess.run([D_train, D_total_loss], feed_dict = {real_data : real_batch, input : noise(batch_size)})
        _, g_err = sess.run([G_train, G_loss], feed_dict = {input : noise(batch_size)})

    print("Epoch = ", epoch)
    print("D_loss = ", d_err)
    print("G_loss = ", g_err)
    loss_g_function.append(g_err)
    loss_d_function.append(d_err)

# Visualizing
import matplotlib.pyplot as plt

test_noise = noise(1)

plt.subplot(2, 2, 1)
plt.plot(test_noise[0])
plt.title("Noise")
plt.subplot(2, 2, 2)
plt.imshow(np.reshape(sess.run(G, feed_dict = {input : test_noise})[0], [28, 28]))
plt.title("Generated Image")
plt.subplot(2, 2, 3)
plt.plot(loss_d_function, 'r')
plt.xlabel("Epochs")
plt.ylabel("Discriminator Loss")
plt.title("D-Loss")
plt.subplot(2, 2, 4)
plt.plot(loss_g_function, 'b')
plt.xlabel("Epochs")
plt.ylabel("Generator Loss")
plt.title("G_Loss")
plt.show()

I have tried lr = 0.001 lr = 0.0001 and lr = 0.00003.

These are my results : https://i.sstatic.net/NXA0H.jpg

What could be the reason? My weights initialization are randomly drawn from the normal distribution. Also, please check the loss function, are they correct?

Vijay Mariappan · Accepted Answer

Issues:

It has just a single layer:

hl1 = tf.add(tf.matmul(x, weights['hl1']), biases['hl1'])    
ol = tf.nn.sigmoid(tf.add(tf.matmul(hl1, weights['ol']), biases['ol']))

Above network defined for both discriminator and generator has no activation defined for the first layer. This literally means the network is just one layer: y = act(w2(x*w1+b1)+b2) = act(x*w+b)

Sigmoid applied twice:

ol = tf.nn.sigmoid(tf.add(tf.matmul(hl1, weights['ol']) ...
D_real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(...)

As mentioned in the comments, activation is applied twice.

Weight initializations:

tf.Variable(tf.random_normal([784, 200]))

In case of sigmoid activation if the weights are large, the gradients will be small, which means the weights are effectively not changing values. (Bigger w + very small delta(w)). May be the reason why when i run the above code, the loss seems to not change much. Its better to adopt industry best practices and use something like: xavier_initializer().

Dynamic range inconsistencies: The input to the generator is in the dynamic range of [-1, 1], it gets multipled by a weight of [-1, 1] but gets outputed to a [ 0 1] range. There is nothing wrong with this, a bias can learn to map the output range. But its better to use a activation layer, that outputs [-1, 1] like a tanh, so the network can learn faster. If tanh is used as activation for the generator, then the input images feed to the descriminator need to be scaled to [-1 1] for training consistency.

With the above changes, you can get something similar to:

The above network is a really simple one and the output quality is not great. I have deliberately not changed the complexity to find out what kind of output one can get out of a simple network.

You can build a bigger network (that includes CNN) and as well try out recent GAN models to obtain better quality results.

Code for reproducing the above can be obtained from here.

GAN not converging. Discriminator loss keeps increasing

Answers (1)

Issues:

Related Questions