Reputation: 22634
I'm studying Udacity Deep Learning class and its homework says "Demonstrate an extreme case of overfitting. Restrict your training data to just a few batches."
My question is:
1)
Why does reducing num_steps, num_batches
have anything to do with over-fitting? We are not adding any variables nor increasing the size of W.
In below code, num_steps used to be 3001 and num_batches were 128 and the solution is just reducing them to 101 and 3, respectively.
num_steps = 101
num_bacthes = 3
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
print("Initialized")
for step in range(num_steps):
# Pick an offset within the training data, which has been randomized.
# Note: we could use better randomization across epochs.
#offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
offset = step % num_bacthes
# Generate a minibatch.
batch_data = train_dataset[offset:(offset + batch_size), :]
batch_labels = train_labels[offset:(offset + batch_size), :]
# Prepare a dictionary telling the session where to feed the minibatch.
# The key of the dictionary is the placeholder node of the graph to be fed,
# and the value is the numpy array to feed to it.
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, beta_regul : 1e-3}
_, l, predictions = session.run(
[optimizer, loss, train_prediction], feed_dict=feed_dict)
if (step % 2 == 0):
print("Minibatch loss at step %d: %f" % (step, l))
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
print("Validation accuracy: %.1f%%" % accuracy(
valid_prediction.eval(), valid_labels))
print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
This code is an excerpt from the solution: https://github.com/rndbrtrnd/udacity-deep-learning/blob/master/3_regularization.ipynb
2) Can someone explain the concept of "offset" in gradient descent? Why do we have to use it?
3) I've experimented with num_steps and found out that if I increase num_steps, the accuracy goes up. Why? How should I interpret num_step with learning rate?
Upvotes: 0
Views: 647
Reputation: 11
1) It's quite typical to set early stopping conditions when you 're training neural networks in order to prevent overfitting. You're not adding new variables, but using early stopping conditions you're not able to use them intensively and badly, what is more o less equivalent.
2) In this case "offset" are the remaining observations not used in minibatch (rest of the division)
3) Think of "learning rate" as "speed" and "num_steps" as "time". If you run longer, you may drive further... but maybe if you drive faster maybe you could get crashed and not go much further...
Upvotes: 1