How to find if a data set can train a neural network?

Question

Some experimental data contains 512 independent boolean features and a boolean result.

There are about 1e6 real experiment records in the provided data set.

In a classic XOR example all 4 out of 4 possible states are required to train NN. In my case its only 2^(10-512) = 2^-505 which is close to zero.

I have no more information about the data nature, just these (512 + 1) * 1e6 bits.

Tried NN with 1 hidden layer on available data. Output of the trained NN on the samples even from the training set are always close to 0, not a single close to "1". Played with weights initialization, gradient descent learning rate.

My code utilizing TensorFlow 1.3, Python 3. Model excerpt:

with tf.name_scope("Layer1"):
    #W1 = tf.Variable(tf.random_uniform([512, innerN], minval=-2/512, maxval=2/512), name="Weights_1")
    W1 = tf.Variable(tf.zeros([512, innerN]), name="Weights_1")
    b1 = tf.Variable(tf.zeros([1]), name="Bias_1")
    
    Out1 = tf.sigmoid( tf.matmul(x, W1) + b1)

with tf.name_scope("Layer2"):
    W2 = tf.Variable(tf.random_uniform([innerN, 1], minval=-2/512, maxval=2/512), name="Weights_2")
    #W2 = tf.Variable(tf.zeros([innerN, 1]), name="Weights_2")
    b2 = tf.Variable(tf.zeros([1]), name="Bias_2")
    
    y = tf.nn.sigmoid( tf.matmul(Out1, W2) + b2)

with tf.name_scope("Training"):
    y_ = tf.placeholder(tf.float32, [None,1])
    
    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(
            labels = y_, logits = y)
    )

    train_step = tf.train.GradientDescentOptimizer(0.005).minimize(cross_entropy)
    
with tf.name_scope("Testing"):
    # Test trained model
    correct_prediction = tf.equal( tf.round(y), tf.round(y_))
# ...
# Train
for step in range(500):
    batch_xs, batch_ys = Datasets.train.next_batch(300, shuffle=False)
    _, my_y, summary = sess.run([train_step, y, merged_summaries],
        feed_dict={x: batch_xs, y_: batch_ys})

I suspect two cases:

my fault – bad NN implementation, wrong architecture;
bad data. Compared to XOR example, incomplete training data would result in a failing NN. However, the training examples fed to the trained NN are supposed to give right predictions, aren't they?

How to evaluate if it is possible at all to train a neural network (a 2-layer perceptron) on the provided data to forecast the result? A case of aceptable set would be the XOR example. Opposed to some random noise.

gngdb · Accepted Answer

There are only ad hoc ways to know if it is possible to learn a function with a differentiable network from a dataset. That said, these ad hoc ways do usually work. For example, the network should be able to overfit the training set without any regularisation.

A common technique to gauge this is to only fit the network on a subset of the full dataset. Check that the network can overfit to that, then increase the size of the subset, and increase the size of the network as well. Unfortunately, deciding whether to add extra layers or add more units in a hidden layer is an arbitrary decision you'll have to make.

However, looking at your code, there are a few things that could be going wrong here:

Are your outputs balanced? By that I mean, do you have the same number of 1s as 0s in the dataset targets?
Your initialisation in the first layer is all zeros, the gradient to this will be zero, so it can't learn anything (although, you have a real initialisation above it commented out).
Sigmoid nonlinearities are more difficult to optimise than simpler nonlinearities, such as ReLUs.

I'd recommend using the built-in definitions for layers in Tensorflow to not worry about initialisation, and switching to ReLUs in any hidden layers (you need sigmoid at the output for your boolean target).

Finally, deep learning isn't actually very good at most "bag of features" machine learning problems because they lack structure. For example, the order of the features doesn't matter. Other methods often work better, but if you really want to use deep learning then you could look at this recent paper, showing improved performance by just using a very specific nonlinearity and weight initialisation (change 4 lines in your code above).

How to find if a data set can train a neural network?

Answers (1)

Related Questions