user3709260
user3709260

Reputation: 431

keep_prob value in dropout and getting worst results with dropout

According to this link, the value of keep_prob has to be between (0,1]: Tensorflow manual

Otherwise I'll get value error:

ValueError: If keep_prob is not in (0, 1] or if x is not a floating point tensor.

I'm using the following code for a simple neural network with one hidden layer:

n_nodes_input = len(train_x.columns) # number of input features
n_nodes_hl = 30     # number of units in hidden layer
n_classes = len(np.unique(Y_train_numeric)) 
lr = 0.25
x = tf.placeholder('float', [None, len(train_x.columns)])
y = tf.placeholder('float')
dropout_keep_prob = tf.placeholder(tf.float32)

def neural_network_model(data, dropout_keep_prob):
    # define weights and biases for all each layer
    hidden_layer = {'weights':tf.Variable(tf.truncated_normal([n_nodes_input, n_nodes_hl], stddev=0.3)),
                      'biases':tf.Variable(tf.constant(0.1, shape=[n_nodes_hl]))}
    output_layer = {'weights':tf.Variable(tf.truncated_normal([n_nodes_hl, n_classes], stddev=0.3)),
                    'biases':tf.Variable(tf.constant(0.1, shape=[n_classes]))}
    # feed forward and activations
    l1 = tf.add(tf.matmul(data, hidden_layer['weights']), hidden_layer['biases'])
    l1 = tf.nn.sigmoid(l1)
    l1 = tf.nn.dropout(l1, dropout_keep_prob)
    output = tf.matmul(l1, output_layer['weights']) + output_layer['biases']

    return output

def main():
    prediction = neural_network_model(x, dropout_keep_prob)
    cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y,logits=prediction))
    optimizer = tf.train.AdamOptimizer(lr).minimize(cost)

    sess = tf.InteractiveSession()

    tf.global_variables_initializer().run()
    for epoch in range(1000):
        loss = 0
        _, c = sess.run([optimizer, cost], feed_dict = {x: train_x, y: train_y, dropout_keep_prob: 4.})
        loss += c

        if (epoch % 100 == 0 and epoch != 0):
            print('Epoch', epoch, 'completed out of', 1000, 'Training loss:', loss)
    correct = tf.equal(tf.argmax(prediction,1), tf.argmax(y,1))
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name='op_accuracy')

    writer = tf.summary.FileWriter('graph',sess.graph)
    writer.close()

    print('Train set Accuracy:', sess.run(accuracy, feed_dict = {x: train_x, y: train_y, dropout_keep_prob: 1.}))
    print('Test set Accuracy:', sess.run(accuracy, feed_dict = {x: test_x, y: test_y, dropout_keep_prob: 1.}))
    sess.close()


if __name__ == '__main__':
     main()

If I use a number in range (0,1] for dropout_keep_prob in the sess.run, the accuracy drops drastically. If I use a number bigger than 1, like 4, the accuracy goes beyond 0.9. Once I use shift+tab in front of tf.nn.dropout(), this is written as part of description:

With probability `keep_prob`, outputs the input element scaled up by
`1 / keep_prob`, otherwise outputs `0`.  The scaling is so that the expected
sum is unchanged.

which seems to me that keep_prob has to be greater than 1 otherwise nothing would be dropped!

Bottom line, I'm confused. Which part of dropout have I implemented wrong that my results are getting worst and what is a good number for keep_drop?

Thank you

Upvotes: 1

Views: 3886

Answers (1)

Dennis Soemers
Dennis Soemers

Reputation: 8478

which seems to me that keep_prob has to be greater than 1 otherwise nothing would be dropped!

The description says:

With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0. The scaling is so that the expected sum is unchanged.

This means that:

  • keep_prob is used as a probability, so by definition it should always be in [0, 1] (a number outside that range can never be a probability)
  • With probability keep_prob, input elements are multiplied by 1 / keep_prob. Since we've just written that 0 <= keep_prob <= 1, the division 1 / keep_prob is always going to be greater than 1.0 (or exactly 1.0 if keep_prob == 1). So, with probability keep_prob, some elements are going to become bigger than they were without dropout
  • With probability 1 - keep_prob (the "otherwise" in the description), elements are set to 0. This is the dropout, elements are dropped if they got set to 0. If you set keep_prob to exactly 1.0, this means the probability of dropping any node becomes 0. So, if you want to drop some nodes, you should set keep_prob < 1, and if you don't want to drop anything, you set keep_prob = 1.

Important note: You only want to use dropout during training, not during testing.

If I use a number in range (0,1] for dropout_keep_prob in the sess.run, the accuracy drops drastically.

If you do this for the test set, or if you mean you're reporting accuracy on the training set, that doesn't surprise me. Dropout means losing information, so it is indeed going to lose accuracy. It's supposed to be a way of regularizing though; you intentionally lose accuracy during the training phase, but hope this results in improved generalization and therefore improved accuracy during the test phase (when you should no longer be using dropout).

If I use a number bigger than 1, like 4, the accuracy goes beyond 0.9.

I'm surprised you get this code to run at all. Based on the source code, I wouldn't expect it to run?

Upvotes: 3

Related Questions