Clarification between Epoch and iteration

Question

This answer points to the difference between an Epoch and an iteration while training a neural network. However, when I look at the source code for the solver API in the Stanford CS231n course (and I'm assuming this is the case for most libraries out there as well), during each iteration, batch_size number of examples are randomly selected with replacement. Thus, there is no guarantee that all examples would been seen during each epoch is there?

Does an epoch then mean that all examples would be seen in expectation? Or am I understanding this wrong?

Relevant Source Code:

  def _step(self):
    """
    Make a single gradient update. This is called by train() and should not
    be called manually.
    """
    # Make a minibatch of training data
    num_train = self.X_train.shape[0]
    batch_mask = np.random.choice(num_train, self.batch_size)
    X_batch = self.X_train[batch_mask]
    y_batch = self.y_train[batch_mask]

    # Compute loss and gradient
    loss, grads = self.model.loss(X_batch, y_batch)
    self.loss_history.append(loss)

    # Perform a parameter update
    for p, w in self.model.params.iteritems():
      dw = grads[p]
      config = self.optim_configs[p]
      next_w, next_config = self.update_rule(w, dw, config)
      self.model.params[p] = next_w
      self.optim_configs[p] = next_config

  def train(self):
    """
    Run optimization to train the model.
    """
    num_train = self.X_train.shape[0]
    iterations_per_epoch = max(num_train / self.batch_size, 1)
    num_iterations = self.num_epochs * iterations_per_epoch

    for t in xrange(num_iterations):
      self._step()

      # Maybe print training loss
      if self.verbose and t % self.print_every == 0:
        print '(Iteration %d / %d) loss: %f' % (
               t + 1, num_iterations, self.loss_history[-1])

      # At the end of every epoch, increment the epoch counter and decay the
      # learning rate.
      epoch_end = (t + 1) % iterations_per_epoch == 0
      if epoch_end:
        self.epoch += 1
        for k in self.optim_configs:
          self.optim_configs[k]['learning_rate'] *= self.lr_decay

      # Check train and val accuracy on the first iteration, the last
      # iteration, and at the end of each epoch.
      first_it = (t == 0)
      last_it = (t == num_iterations + 1)
      if first_it or last_it or epoch_end:
        train_acc = self.check_accuracy(self.X_train, self.y_train,
                                        num_samples=1000)
        val_acc = self.check_accuracy(self.X_val, self.y_val)
        self.train_acc_history.append(train_acc)
        self.val_acc_history.append(val_acc)

        if self.verbose:
          print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
                 self.epoch, self.num_epochs, train_acc, val_acc)

        # Keep track of the best model
        if val_acc > self.best_val_acc:
          self.best_val_acc = val_acc
          self.best_params = {}
          for k, v in self.model.params.iteritems():
            self.best_params[k] = v.copy()

    # At the end of training swap the best params into the model
    self.model.params = self.best_params

Thanks.

javidcf · Accepted Answer

I believe, as you say, that in the Stanford course they are effectively using "epoch" with the less strict meaning of "expected number of times each example is seen during training". However, in my experience, most implementations consider an epoch as running through every example in the training set once, and I'd say they only chose the sampling with replacement for simplicity. If you have a good amount of data, chances are that you will not see a difference, but still, it is more correct to sample without replacement until there are no more examples.

You can check, for example, how Keras does the training in its source code; it's a bit complicated, but the important point is that make_batches is called to split the (possibly shuffled) examples into batches, which matches your initial idea of "epoch".

Clarification between Epoch and iteration

Answers (1)

Related Questions