Willjay
Willjay

Reputation: 6469

How to tune L2 Regularization

I'm currently studying Deep Learning on Udactity.

I successful built and train a neural network with one hidden layer and I got an 93% accuracy with test data. However, when I introduced L2 regularization into my model. The accuracy drops to 89%. Is that something wrong with my regularization?

beta = 0.01

n_hidden_layer = 1024
n_input = 784         # 28* 28
n_classes = 10

# Variables
weights = {
    'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_layer], stddev=0.1)),
    'out': tf.Variable(tf.truncated_normal([n_hidden_layer, n_classes], stddev=0.1))
}
biases = {
    'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden_layer])),
    'out': tf.Variable(tf.constant(0.1, shape=[n_classes]))
}

def multilayer_perceptron(x, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    # Output layer with linear activation
    out_layer = tf.matmul(layer_1, weights['out']) + biases['out']
    return out_layer

# Construct model 
pred = multilayer_perceptron(x, weights, biases)
valid_pred = multilayer_perceptron(tf_valid_dataset, weights, biases)
test_pred = multilayer_perceptron(tf_test_dataset, weights, biases)

# Define loss and optimizer
# L' = L + Beta * (0.5 * ||w||^2) 
l2 = beta * tf.nn.l2_loss(weights['h1']) + beta * tf.nn.l2_loss(weights['out'])
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=pred) + l2)
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

The right amount of regularization should improve your validation / test accuracy.

But when I change the beta to 0.001, I got a 93.7% accuracy. So, should I define beta as a tf.Variable to tune itself?

Upvotes: 1

Views: 2350

Answers (1)

rmeertens
rmeertens

Reputation: 4451

To understand why the variable of beta you have to understand what L2 regularisation does: it punished large weights! How much it should punish these weights is something that depends on the application. Some applications need larger weights than others.

The beta variable is a parameter you have to set "manually". It is not something you should add as a tf.Variable. What you can do, however, is create a quick hyperparameter search where you iterate over several values of beta to pick the best one! Try plotting the loss for several values to determine what value would be best!

Let me know if you have any more questions!

Upvotes: 2

Related Questions