Use neural network to learn a square wave function

Question

Out of curiosity, I am trying to build a simple fully connected NN using tensorflow to learn a square wave function such as the following one:

Therefore the input is a 1D array of x value (as the horizontal axis), and the output is a binary scalar value. I used tf.nn.sparse_softmax_cross_entropy_with_logits as loss function, and tf.nn.relu as activation. There are 3 hidden layers (100*100*100) and a single input node and output node. The input data are generated to match the above wave shape and therefore the data size is not a problem.

However, the trained model seems to fail completed, predicting for the negative class always.

So I am trying to figure out why this happened. Whether the NN configuration is suboptimal, or it is due to some mathematical flaw in NN beneath the surface (though I think NN should be able to imitate any function).

Thanks.

As per suggestions in the comment section, here is the full code. One thing I noticed saying wrong earlier is, there were actually 2 output nodes (due to 2 output classes):

"""
    See if neural net can find piecewise linear correlation in the data
"""

import time
import os
import tensorflow as tf
import numpy as np

def generate_placeholder(batch_size):
    x_placeholder = tf.placeholder(tf.float32, shape=(batch_size, 1))
    y_placeholder = tf.placeholder(tf.float32, shape=(batch_size))
    return x_placeholder, y_placeholder

def feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, loop):
    x_selected = [[None]] * batch_size
    y_selected = [None] * batch_size
    for i in range(batch_size):
        x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
        y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]
    feed_dict = {x_placeholder: x_selected,
                 y_placeholder: y_selected}
    return feed_dict

def inference(input_x, H1_units, H2_units, H3_units):

    with tf.name_scope('H1'):
        weights = tf.Variable(tf.truncated_normal([1, H1_units], stddev=1.0/2), name='weights') 
        biases = tf.Variable(tf.zeros([H1_units]), name='biases')
        a1 = tf.nn.relu(tf.matmul(input_x, weights) + biases)

    with tf.name_scope('H2'):
        weights = tf.Variable(tf.truncated_normal([H1_units, H2_units], stddev=1.0/H1_units), name='weights') 
        biases = tf.Variable(tf.zeros([H2_units]), name='biases')
        a2 = tf.nn.relu(tf.matmul(a1, weights) + biases)

    with tf.name_scope('H3'):
        weights = tf.Variable(tf.truncated_normal([H2_units, H3_units], stddev=1.0/H2_units), name='weights') 
        biases = tf.Variable(tf.zeros([H3_units]), name='biases')
        a3 = tf.nn.relu(tf.matmul(a2, weights) + biases)

    with tf.name_scope('softmax_linear'):
        weights = tf.Variable(tf.truncated_normal([H3_units, 2], stddev=1.0/np.sqrt(H3_units)), name='weights') 
        biases = tf.Variable(tf.zeros([2]), name='biases')
        logits = tf.matmul(a3, weights) + biases

    return logits

def loss(logits, labels):
    labels = tf.to_int32(labels)
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')
    return tf.reduce_mean(cross_entropy, name='xentropy_mean')

def inspect_y(labels):
    return tf.reduce_sum(tf.cast(labels, tf.int32))

def training(loss, learning_rate):
    tf.summary.scalar('lost', loss)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    global_step = tf.Variable(0, name='global_step', trainable=False)
    train_op = optimizer.minimize(loss, global_step=global_step)
    return train_op

def evaluation(logits, labels):
    labels = tf.to_int32(labels)
    correct = tf.nn.in_top_k(logits, labels, 1)
    return tf.reduce_sum(tf.cast(correct, tf.int32))

def run_training(x, y, batch_size):
    with tf.Graph().as_default():
        x_placeholder, y_placeholder = generate_placeholder(batch_size)
        logits = inference(x_placeholder, 100, 100, 100)
        Loss = loss(logits, y_placeholder)
        y_sum = inspect_y(y_placeholder)
        train_op = training(Loss, 0.01)
        init = tf.global_variables_initializer()
        sess = tf.Session()
        sess.run(init)
        max_steps = 10000
        for step in range(max_steps):
            start_time = time.time()
            feed_dict = feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, step)
            _, loss_val = sess.run([train_op, Loss], feed_dict = feed_dict)
            duration = time.time() - start_time
            if step % 100 == 0:
                print('Step {}: loss = {:.2f} {:.3f}sec'.format(step, loss_val, duration))
    x_test = np.array(range(1000)) * 0.001
    x_test = np.reshape(x_test, (1000, 1))
    _ = sess.run(logits, feed_dict={x_placeholder: x_test})
    print(min(_[:, 0]), max(_[:, 0]), min(_[:, 1]), max(_[:, 1]))
    print(_)

if __name__ == '__main__':

    population = 10000

    input_x = np.random.rand(population)
    input_y = np.copy(input_x)

    for bin in range(10):
        print(bin, bin/10, 0.5 - 0.5*(-1)**bin)
        input_y[input_x >= bin/10] = 0.5 - 0.5*(-1)**bin

    batch_size = 1000

    input_x = np.reshape(input_x, (population, 1))

    run_training(input_x, input_y, batch_size)

Sample output shows that the model always prefer the first class over the second, as shown by min(_[:, 0]) > max(_[:, 1]), i.e. the minimum logit output for the first class is higher than the maximum logit output for the second class, for a sample size of population.

My mistake. The problem occurred in the line:

for i in range(batch_size):
    x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
    y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]

Python is mutating the whole list of x_selected to the same value. Now this code issue is resolved. The fix is:

x_selected = np.zeros((batch_size, 1))
y_selected = np.zeros((batch_size,))
for i in range(batch_size):
    x_selected[i, 0] = x[(loop*batch_size + i) % x.shape[0], 0]
    y_selected[i] = y[(loop*batch_size + i) % y.shape[0]]

After this fix, the model is showing more variation. It currently outputs class 0 for x <= 0.5 and class 1 for x > 0.5. But this is still far from ideal.

So after changing the network configuration to 100 nodes * 4 layers, after 1 million training steps (batch size = 100, sample size = 10 million), the model is performing very well showing only errors at the edges when y flips. Therefore this question is closed.

greeness · Accepted Answer

You essentially try to learn a periodic function and the function is highly non-linear and non-smooth. So it is NOT simple as it looks like. In short, a better representation of the input feature helps.

Suppose your have a period T = 2, f(x) = f(x+2). For a reduced problem when input/output are integers, your function is then f(x) = 1 if x is odd else -1. In this case, your problem would be reduced to this discussion in which we train a Neural Network to distinguish between odd and even numbers.

I guess the second bullet in that post should help (even for the general case when inputs are float numbers).

Try representing the numbers in binary using a fixed length precision.

In our reduced problem above, it's easy to see that the output is determined iff the least-significant bit is known.

decimal  binary  -> output
1:       0 0 1   -> 1
2:       0 1 0   -> -1
3:       0 1 1   -> 1
...

Use neural network to learn a square wave function

Answers (2)

Related Questions