Reputation:
I don't know if you can help me here, but I am having a problem I can't figure out. I have a large (for me) data set of around 450,000 entries. Each entry is an list of about ~700 integers, formatted like this:
[217088.0, 212992.0, 696.0, 191891.0, 524.0, 320.0, 0.0, 496.0, 0, 0, 364.0, 20.0, 0, 1.0, 0, 0.0, 0, 4.0, 22.0, 0, 672.0, 46.0, 16.0, 0.0, 0.0, 106496.0, 8.0, 0, 4.0, 2.0, 26.0, 640.0, 0.0, 1073741888.0, 624.0, 516.0, 4.0, 3.0, 0, 4319139.0, 0.0, 0, 0.0, 36.0, 8.0, 217088.0, 0.0, 0, 0, 0, 4.0, 5.0, 0, 20.0, 255624.0, 65535.0, 5.10153058443, 396.0, 4319140.0, 552.0, 144.0, 28.0, 5.0, 1048576.0, 217088.0, 350.0, 0.0, 0, 7.0, 1048576.0, 260.0, 0, 116.0, 0, 322.0, 0.0, 0, 4319141.0, 0.0, 10.0, 0.0, 9.0, 4.0, 0, 0, 0, 6.36484131641, 0.0, 0, 11.0, 72.0, 372.0, 45995.0, 217088.0, 0, 4096.0, 12.0, 80.0, 592.0, 264.0, 0, 0, 4096.0, 0.0, 256.0, 0.0, 49152.0, 700.0, 0, 4096.0, 0, 0, 0.0, 336.0, 8.0, 0, 0.0, 0, 4319142.0, 0.0, 60.0, 308.0, 4319143.0, 0, 0, 0, 0, 0, 0.742746270768, 316.0, 420.0, 276.0, 1073741888.0, 0.0, 332.0, 284.0, 0, 1107296320.0, 0.0, 4.0, 13.0, 18.0, 0.0, 632.0, 424.0, 261200.0, 0.0, 299008.0, 0.0, 4096.0, 0, 0.0, 299008.0, 0, 658.0, 0, 4319144.0, 4319145.0, 12.0, 50.0, 292.0, 688.0, 484.0, 70.0, 20.0, 4319146.0, 16.0, 17.0, 0, 0, 0, 0.0, 18.0, 4.0, 330.0, 0.0, 0, 0.0, 42.0, 303104.0, 19.0, 8.0, 20.0, 0.0, 0.0, 544.0, 340.0, 0, 14.0, 0, 209078.0, 0.0, 0.0, 22.0, 0, 209078.0, 0.0, 0.0, 18932.0, 4319147.0, 4.58031739078, 0.0, 376.0, 0.0, 0, 632.0, 4.0, 0, 0, 0, 428.0, 0, 0, 323584.0, 0.0, 24.0, 4.0, 368.0, 12.0, 40.0, 0, 720.0, 4.0, 348.0, 267.0, 20468.0, 32.0, 45995.0, 303104.0, 0.0, 0.0, 0, 0, 224.0, 16.0, 4.0, 44.0, 0.0, 0.0, 444.0, 720.0, 0, 1180.0, 0.0, 16.0, 412.0, 0.0, 4.0, 8462.0, 600.0, 568.0, 16.0, 0, 2.0, 36.0, 0.0, 6.0, 0, 21.0, 0.0, 24.0, 0, 4.0, 652.0, 4319148.0, 92.0, 8.0, 2.0, 0, 0.0, 0, 16.0, 0, 0, 324.0, 4.0, 300.0, 0, 278.0, 400.0, 0, 0.0, 0, 352.0, 0, 0.0, 209078.0, 8.0, 4096.0, 8.0, 36.0, 0.0, 256.0, 268435456.0, 0.0, 48.0, 4319149.0, 6.0, 4319150.0, 0, 416.0, 0, 0, 283.0, 4.0, 0, 0, 0, 8.0, 592.0, 0, 0, 25.0, 0.0, 0, 0, 0.0, 332.0, 212992.0, 540.0, 512.0, 0, 532.0, 20.0, 26.0, 0.0, 0, 52.0, 440.0, 7.0, 488.0, 8.0, 12.0, 0.0, 60.0, 14.0, 3221225536.0, 7.0, 56.0, 432.0, 4.0, 0, 12.0, 0.0, 40.0, 680.0, 16.0, 504.0, 344.0, 576.0, 0.0, 452.0, 266240.0, 290816.0, 578.0, 0, 552.0, 34.0, 0.0, 636.0, 88.0, 698.0, 282.0, 328.0, 38.0, 8.0, 480.0, 64.0, 4319151.0, 0.0, 0.0, 34.0, 460.0, 64.0, 0, 612.0, 0.0, 4319152.0, 0, 604.0, 0, 436.0, 0, 0, 20.0, 0, 4.0, 0, 0, 0, 0, 40.0, 356.0, 584.0, 0, 84.0, 0.0, 0, 0, 0, 294912.0, 7.0, 29.0, 20.0, 0, 60.0, 0.0, 268.0, 536.0, 4319153.0, 0.0, 106.0, 456.0, 24.0, 404.0, 0, 31.0, 0, 380.0, 24.0, 648.0, 0.0, 0, 0, 0.0, 0, 0, 0, 0.0, 0, 0, 0.0, 0.0, 1883.0, 5.85655736551, 34.0, 17744.0, 28680.0, 38.0, 36.0, 0.0, 24576.0, 596.0, 107.0, 33.0, 4.0, 5.0, 0, 0, 45995.0, 384.0, 8.0, 0, 0, 500.0, 20468.0, 34.0, 312.0, 8.0, 660.0, 0.0, 35.0, 608.0, 0, 684.0, 8.0, 68.0, 0.0, 32.0, 34.0, 23117.0, 3.0, 520.0, 0, 4319154.0, 0, 0, 512.0, 8.0, 28.0, 4096.0, 0, 538.0, 0.0, 572.0, 0.0, 2.0, 36.0, 0.0, 0.0, 32.0, 32.0, 4.0, 28.0, 0, 4.0, 38.0, 68.0, 9.0, 0.0, 0, 0.0, 36.0, 39.0, 618.0, 0, 8.0, 266240.0, 4.0, 5.0, 34.0, 304.0, 0, 0.0, 20.0, 40.0, 0.0, 0.0, 0, 580.0, 556.0, 4.0, 8.0, 262.0, 0, 12.0, 32.0, 0, 76.0, 12.0, 184.0, 720.0, 4.0, 16.0, 644.0, 16.0, 28680.0, 4319155.0, 720.0, 0.0, 564.0, 392.0, 672.0, 0.0, 24.0, 492.0, 0, 0.0, 676.0, 0, 0, 0, 12.0, 592.0, 360.0, 8.0, 692.0, 552.0, 4.0, 36.0, 512.0, 7198.0, 42.0, 44.0, 45.0, 4319156.0, 20.0, 388.0, 476.0, 5.0, 36.0, 20480.0, 47.0, 16.0, 326.0, 0.0, 12.0, 0.0, 0.0, 7.0, 272.0, 280.0, 0.0, 0, 288.0, 48.0, 4319157.0, 10.0, 448.0, 4.0, 4.0, 0, 20468.0, 408.0, 2.0, 50.0, 560.0, 0, 1610612768.0, 8.0, 0, 620.0, 656.0, 4.0, 4096.0, 51.0, 0, 0, 0.0, 28.0, 0, 616.0, 0, 296.0, 2.0, 632.0, 468.0, 28.0, 32.0, 52.0, 0, 528.0, 0, 28.0, 0.0, 0, 24.0, 18.0, 4096.0, 0, 8.0, 180.0, 664.0, 4319158.0, 26.0, 0.0, 6.0, 0, 4096.0, 472.0, 0, 28.0, 72.0, 464.0, 672.0, 0, 24.0, 4.0, 0, 28680.0, 0, 0, 18.0, 0, 0, 4319159.0, 24.0, 28.0, 16.0]
I am using Tflearn to try and create a categorical model off of this data, e.g. each entry has a 0 or 1 label and I'm trying to train the model to predict whether an unknown entry is 0 or 1. Here is a summary of my code:
def main():
## Options ##
num_tf_layers = 10 # Number of fully connected layers, ex. softmax layer
num_tf_layer_nodes = 32 # Number of nodes in the fully connected layers
print_test_scores = 1 # Bool to print test set and predictions
use_validation_set = 0 # Bool to use testing set when fitting
num_tf_epochs = 10
tf_batch_size = 1
tf_learn_rate = 0.001
## Opening files
print("Preparing labels...")
trainY = tflearn.data_utils.to_categorical(temp_train_Y, nb_classes=2)
if use_validation_set:
testY = tflearn.data_utils.to_categorical(temp_test_Y, nb_classes=2)
print('Forming input data...')
net = tflearn.input_data(shape=[None, len(trainX[0])])
print('Creating fully connected layers...')
for i in range(num_tf_layers):
net = tflearn.fully_connected(net, num_tf_layer_nodes)
print('Creating softmax layer...')
net = tflearn.fully_connected(net, 2, activation='softmax')
print('Preparing regression...')
net = tflearn.regression(net, learning_rate=tf_learn_rate)
print('Preparing DNN...')
model = tflearn.DNN(net)
print('Fitting...')
if use_validation_set:
model.fit(trainX, trainY, n_epoch=num_tf_epochs, batch_size=tf_batch_size, validation_set=(testX, testY), show_metric=True)
else:
model.fit(trainX, trainY, n_epoch=num_tf_epochs, batch_size=tf_batch_size, show_metric=True)
print('Complete...')
I based this off of the following TFlearn example. This program worked beautifully on a small set of data, 250 0's and 250 1's. I had high 80% accuracy rates and I thought that adding a ton more data would help bring the accuracy higher. However, after adding the large amounts of data, the loss going to NaN extremely quickly. Not even one iteration through the 450,000 quickly. After some research I saw that I might have too high of a learning rate, as I had left it to default. I set it between 0.1 and 0.000001 and nothing has stopped the loss from going to NaN. I have also tried changing the batch size between 1 and 1024, and changing the number of layers between 3 and 20. Nothing has helped. Does anyone have any ideas about what to change or how to approach this differently in order to fix it?
Thanks!
Upvotes: 2
Views: 1267
Reputation: 37721
I am guessing your network is suffering from vanishing gradient problem. This is not a fundamental problem with neural networks - it's a problem with gradient-based learning methods caused by certain activation functions. Let's try to intuitively understand the problem and the cause behind it.
Problem
Gradient-based methods learn a parameter's value by understanding how a small change in the parameter's value will affect the network's output. If a change in the parameter's value causes a very small change in the network's output - the network just can't learn the parameter effectively, which is a problem.
This is exactly what's happening in the vanishing gradient problem - the gradients of the network's output with respect to the parameters in the early layers become extremely small. That's a fancy way of saying that even a large change in the value of parameters for the early layers doesn't have a big effect on the output. Let's try to understand when and why does this problem happen.
Cause
Vanishing gradient problem depends on the choice of the activation function. Many common activation functions (e.g sigmoid or tanh) 'squash' their input into a very small output range in a very non-linear fashion. For example, sigmoid maps the real number line onto a "small" range of [0, 1]
. As a result, there are large regions of the input space which are mapped to an extremely small range. In these regions of the input space, even a large change in the input will produce a small change in the output - hence the gradient is small.
This becomes much worse when we stack multiple layers of such non-linearities on top of each other. For instance, the first layer will map a large input region to a smaller output region, which will be mapped to an even smaller region by the second layer, which will be mapped to an even smaller region by the third layer and so on. As a result, even a large change in the parameters of the first layer doesn't change the output much.
We can avoid this problem by using activation functions which don't have this property of 'squashing' the input space into a small region. A popular choice is Rectified Linear Unit which maps x
to max(0,x)
.
Answer adopted from a post on Quora.
Update: exploding gradient problem
Sometimes the gradient gets much larger in earlier layers and it is known as the exploding gradient problem. For example, if you chose large values for weight matrices and set the bias values in a way that the gradients become larger, then the neural network is going to suffer from exploding gradient problem. Another reason can be if your data points are itself large causing very large steps during gradient descent even with low learning rates. So, you can normalize the data points by column before training to avoid gradient explosion problem.
Moreover, larger learning rate can be another potential reason for exploding gradient problem. I encourage you to go through this article which discusses basic ideas about both vanishing and exploding gradient problems and their solution.
Thanks to @timleathart for his insightful comment.
Upvotes: 2