How to modify the seq2seq cost function for padded vectors?

Question

Tensorflow supports dynamic length sequence by use of the parameter: 'sequence_length' while constructing the RNN layer, wherein the model does not learn the sequence after the sequence size = 'sequence_length' i.e, returns zero vector.

However, how can the cost function at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/seq2seq.py#L890 be modified to encounter the masked sequences, so that cost and perplexity are calculated only on the actual sequences rather than whole padded sequence?

def sequence_loss_by_example(logits, targets, weights, average_across_timesteps=True,  softmax_loss_function=None, name=None):

    if len(targets) != len(logits) or len(weights) != len(logits):
        raise ValueError("Lengths of logits, weights, and targets must be the same "
                         "%d, %d, %d." % (len(logits), len(weights), len(targets)))
      with ops.op_scope(logits + targets + weights, name,
                        "sequence_loss_by_example"):
        log_perp_list = []
        for logit, target, weight in zip(logits, targets, weights):
          if softmax_loss_function is None:
            # TODO(irving,ebrevdo): This reshape is needed because
            # sequence_loss_by_example is called with scalars sometimes, which
            # violates our general scalar strictness policy.
            target = array_ops.reshape(target, [-1])
            crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(
                logit, target)
          else:
            crossent = softmax_loss_function(logit, target)
          log_perp_list.append(crossent * weight)
        log_perps = math_ops.add_n(log_perp_list)
        if average_across_timesteps:
          total_size = math_ops.add_n(weights)
          total_size += 1e-12  # Just to avoid division by 0 for all-0 weights.
          log_perps /= total_size
    return log_perps

Avishkar Bhoopchand · Accepted Answer

This function already supports calculating costs for dynamic sequence lengths through the use of weights. As long as you ensure the weights are 0 for the "padding targets", the cross entropy will be pushed to 0 for those steps:

log_perp_list.append(crossent * weight)

and the total size will also reflect only the non-padding steps:

total_size = math_ops.add_n(weights)

If you're padding with zeros, one way to derive the weights is as follows:

weights = tf.sign(tf.abs(model.targets))

(Note that you might need to cast this to the same type as your targets)

How to modify the seq2seq cost function for padded vectors?

Answers (1)

Related Questions