Reputation: 6469
I'm currently studying Deep Learning on Udactity.
I successful built and train a neural network with one hidden layer and I got an 93% accuracy with test data. However, when I introduced L2 regularization into my model. The accuracy drops to 89%. Is that something wrong with my regularization?
beta = 0.01
n_hidden_layer = 1024
n_input = 784 # 28* 28
n_classes = 10
# Variables
weights = {
'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_layer], stddev=0.1)),
'out': tf.Variable(tf.truncated_normal([n_hidden_layer, n_classes], stddev=0.1))
}
biases = {
'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden_layer])),
'out': tf.Variable(tf.constant(0.1, shape=[n_classes]))
}
def multilayer_perceptron(x, weights, biases):
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
out_layer = tf.matmul(layer_1, weights['out']) + biases['out']
return out_layer
# Construct model
pred = multilayer_perceptron(x, weights, biases)
valid_pred = multilayer_perceptron(tf_valid_dataset, weights, biases)
test_pred = multilayer_perceptron(tf_test_dataset, weights, biases)
# Define loss and optimizer
# L' = L + Beta * (0.5 * ||w||^2)
l2 = beta * tf.nn.l2_loss(weights['h1']) + beta * tf.nn.l2_loss(weights['out'])
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=pred) + l2)
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
The right amount of regularization should improve your validation / test accuracy.
But when I change the beta
to 0.001, I got a 93.7% accuracy. So, should I define beta
as a tf.Variable
to tune itself?
Upvotes: 1
Views: 2350
Reputation: 4451
To understand why the variable of beta you have to understand what L2 regularisation does: it punished large weights! How much it should punish these weights is something that depends on the application. Some applications need larger weights than others.
The beta variable is a parameter you have to set "manually". It is not something you should add as a tf.Variable. What you can do, however, is create a quick hyperparameter search where you iterate over several values of beta to pick the best one! Try plotting the loss for several values to determine what value would be best!
Let me know if you have any more questions!
Upvotes: 2