Annie
Annie

Reputation: 71

tensorflow estimator evaluate much slower than training

I have a custom estimator and am trying to use some custom metrics during evaluation. However, whenever I add these metrics to evaluation, via eval_metric_ops the evaluation becomes really slow (much slower than training which is actually calculating the same metrics). If I don't add the metrics there then I can only see metrics in Tensorboard for training and not for evaluation.

What is the right way to add a custom metric for a custom estimator so that it is saved during evaluation.

This is what I have:

def compute_accuracy(preds, labels):
    total = tf.shape(labels.values)[0]
    preds = tf.sparse_to_dense(preds.indices, preds.dense_shape, preds.values, default_value=-1)
    labels = tf.sparse_to_dense(labels.indices, labels.dense_shape, labels.values, default_value=-2)

    r = tf.shape(labels)[0]
    c = tf.minimum(tf.shape(labels)[1], tf.shape(preds)[1])
    preds = tf.slice(preds, [0,0], [r,c])
    labels = tf.slice(labels, [0,0], [r,c])

    preds = tf.cast(preds, tf.int32)
    labels = tf.cast(labels, tf.int32)

    correct = tf.reduce_sum(tf.cast(tf.equal(preds, labels), tf.int32))
    accuracy = tf.divide(correct, total)
    return accuracy

In model_fn
    edit_dist = tf.reduce_mean(tf.edit_distance(tf.cast(predicted_label[0], tf.int32), labels))
    accuracy = compute_accuracy(predicted_label[0], labels)
    tf.summary.scalar('edit_dist', edit_dist)
    tf.summary.scalar('accuracy', accuracy)

    metrics = {
        'accuracy': tf.metrics.mean(accuracy),
        'edit_dist':tf.metrics.mean(edit_dist),
    }

   if mode == tf.estimator.ModeKeys.EVAL:
        return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=metrics)

As requested, here is the complete model and TfRecord Writer code:

def crnn_model(features, labels, mode, params):

    inputs = features['image']
    print("INPUTS SHAPE", inputs.shape)

    if mode == tf.estimator.ModeKeys.TRAIN:
        batch_size = params['batch_size']
        lr_initial = params['lr']
        lr = tf.train.exponential_decay(lr_initial, global_step=tf.train.get_global_step(),
                                        decay_steps=params['lr_decay_steps'], decay_rate=params['lr_decay_rate'],
                                        staircase=True)
        tf.summary.scalar('lr', lr)
    else:
        batch_size = params['test_batch_size']

    with tf.variable_scope('crnn', reuse=False):
        rnn_output, predicted_label, logits = CRNN(inputs, hidden_size=params['hidden_size'], batch_size=batch_size)

    if mode == tf.estimator.ModeKeys.PREDICT:
        predictions = {
            'predicted_label': predicted_label,
            'logits': logits,
        }
        return tf.estimator.EstimatorSpec(mode, predictions=predictions)


    loss = tf.reduce_mean(tf.nn.ctc_loss(labels=labels, inputs=rnn_output,
                                         sequence_length=23 * np.ones(batch_size),
                                         ignore_longer_outputs_than_inputs=True))
    edit_dist = tf.reduce_mean(tf.edit_distance(tf.cast(predicted_label[0], tf.int32), labels))
    accuracy = compute_accuracy(predicted_label[0], labels)

    metrics = {
        'accuracy': tf.metrics.mean(accuracy),
        'edit_dist':tf.metrics.mean(edit_dist),
    }

    tf.summary.scalar('loss', loss)
    tf.summary.scalar('edit_dist', edit_dist)
    tf.summary.scalar('accuracy', accuracy)

    if mode == tf.estimator.ModeKeys.EVAL:
        return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=metrics)

    assert mode == tf.estimator.ModeKeys.TRAIN

    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        optimizer = tf.train.AdadeltaOptimizer(learning_rate=lr)
        train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

Tf Record Writer code

def _write_fn(self, out_file, image_list, label_list, mode):
    writer = tf.python_io.TFRecordWriter(out_file)
    N = len(image_list)
    for i in range(N):
        if (i % 1000) == 0:
            print('%s Data: %d/%d records saved' % (mode, i,N))
            sys.stdout.flush()

        try:
            #print('Try image: ', image_list[i])
            image = load_image(image_list[i])
        except (ValueError, AttributeError):
            print('Ignoring image: ', image_list[i])
            continue
        label = label_list[i]
        feature = {
            'label': _int64_feature(label),
            'image': _byte_feature(tf.compat.as_bytes(image.tostring()))
        }

        example = tf.train.Example(features=tf.train.Features(feature=feature))

        writer.write(example.SerializeToString())
    writer.close()

Upvotes: 2

Views: 1969

Answers (1)

Ciprian Tomoiagă
Ciprian Tomoiagă

Reputation: 4000

In the Estimator framework, everything happens in the model_fn, namely your crnn_model(features, labels, mode, params). This is why this function has such a complex signature.

The mode parameter indicates whether it is called for training, evaluation or prediction. So, if you want to log additional summaries to tensorboard during the evaluation, you would add them under the if mode == tf.estimator.ModeKeys.EVAL section, or outside any if in the model_fn.

I suppose your eval is much slower because you have different batch sizes for train/eval and the eval batch size could be smaller. You indicated this is not the case.

After a closer look at your code, and having experienced with a similar model, I believe that the evaluation takes longer with metrics because one of the metrics is edit_distance() which is implemented sequentially on the CPU. During training, this op is not required so it is not run.

What I suggest is that you run your train() and evaluate() in different programs, with the same model_fn() and model_dir. This way, train does not need to wait for evaluate. And evaluate will run only when necessary, i.e. when there are new checkpoints in the model_dir. If you don't have 2 GPUs for this, you can either split the GPU memory between the two processes (using a custom run-config with gpu_memory_fraction=0.75 for train) or by hiding the GPU from evaluate() with CUDA_VISIBLE_DEVICES='' environment variable

Upvotes: 2

Related Questions