Step, batch size in Gradient Descent Tensorflow

Question

I'm studying Udacity Deep Learning class and its homework says "Demonstrate an extreme case of overfitting. Restrict your training data to just a few batches."

My question is:

1) Why does reducing num_steps, num_batches have anything to do with over-fitting? We are not adding any variables nor increasing the size of W.

In below code, num_steps used to be 3001 and num_batches were 128 and the solution is just reducing them to 101 and 3, respectively.

    num_steps = 101
    num_bacthes = 3

    with tf.Session(graph=graph) as session:
      tf.initialize_all_variables().run()
      print("Initialized")
      for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.
        #offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        offset = step % num_bacthes
        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, beta_regul : 1e-3}
        _, l, predictions = session.run(
          [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 2 == 0):
          print("Minibatch loss at step %d: %f" % (step, l))
          print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
          print("Validation accuracy: %.1f%%" % accuracy(
            valid_prediction.eval(), valid_labels))
      print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

This code is an excerpt from the solution: https://github.com/rndbrtrnd/udacity-deep-learning/blob/master/3_regularization.ipynb

2) Can someone explain the concept of "offset" in gradient descent? Why do we have to use it?

3) I've experimented with num_steps and found out that if I increase num_steps, the accuracy goes up. Why? How should I interpret num_step with learning rate?

Step, batch size in Gradient Descent Tensorflow

Answers (1)

Related Questions