Adding report_tensor_allocations_upon_oom to cifar10_estimator example

Question

I'm running a modified version of the TensorFlow example https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10_estimator and I'm running out of memory.

The ResourceExhausted error says: Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

I've tried add this in the obvious places in main() but I get the variants of a protobuf error saying that the report_tensor_allocations_upon_oom run option is not found.

def main(job_dir, data_dir, num_gpus, variable_strategy,
         use_distortion_for_training, log_device_placement, num_intra_threads,
         **hparams):
  # The env variable is on deprecation path, default is set to off.
  os.environ['TF_SYNC_ON_FINISH'] = '0'
  os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'

  # Session configuration.
  sess_config = tf.ConfigProto(
      allow_soft_placement=True,
      log_device_placement=log_device_placement,
      intra_op_parallelism_threads=num_intra_threads,
      report_tensor_allocations_upon_oom = True, # Nope
      gpu_options=tf.GPUOptions(
           force_gpu_compatible=True, 
           report_tensor_allocations_upon_oom = True))  # Nope

  config = cifar10_utils.RunConfig(
      session_config=sess_config, model_dir=job_dir, 
      report_tensor_allocations_upon_oom = True)  #Nope
  tf.contrib.learn.learn_runner.run(
      get_experiment_fn(data_dir, num_gpus, variable_strategy,
                        use_distortion_for_training),
      run_config=config,
      hparams=tf.contrib.training.HParams(
          is_chief=config.is_chief,
          **hparams))

Where do I add report_tensor_allocations_upon_oom = True in this example?

iga · Accepted Answer

You would need to register a session run hook to pass extra arguments to session.run() calls that estimator does.

class OomReportingHook(SessionRunHook):
  def before_run(self, run_context):
    return SessionRunArgs(fetches=[],  # no extra fetches
                          options=tf.RunOptions(
                              report_tensor_allocations_upon_oom=True))

Pass the hook in a list of hooks to the relevant method in estimator: https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator

Adding report_tensor_allocations_upon_oom to cifar10_estimator example

Answers (1)

Related Questions