Maarten
Maarten

Reputation: 4749

Is there a way of determining how much GPU memory is in use by TensorFlow?

Tensorflow tends to preallocate the entire available memory on it's GPUs. For debugging, is there a way of telling how much of that memory is actually in use?

Upvotes: 30

Views: 35561

Answers (6)

George El Haber
George El Haber

Reputation: 66

as @V.M previously mentioned, a solution that works well is using: tf.config.experimental.get_memory_info('DEVICE_NAME')

This function returns a dictionary with two keys:

  • 'current': The current memory used by the device, in bytes
  • 'peak': The peak memory used by the device across the run of the program, in bytes. The value of these keys is the ACTUAL memory used not the allocated one that is returned by nvidia-smi.

In reality, for GPUs, TensorFlow will allocate all the memory by default rendering using nvidia-smi to check for the used memory in your code useless. Even if, tf.config.experimental.set_memory_growth is set to true, Tensorflow will no more allocate the whole available memory but is going to remain in allocating more memory than the one is used and in a discrete manner, i.e. allocates 4589MiB then 8717MiB then 16943MiB then 30651 MiB, etc.

A small note concerning the get_memory_info() is that it doesn't return correct values if used in a tf.function() decorated function. Thus, the peak key shall be used after executing tf.function() decorated function to determine the peak memory used.

For older versions of Tensorflow, tf.config.experimental.get_memory_usage('DEVICE_NAME') was the only available function and only returned the used memory (no option for determining the peak memory).

Final note, you can also consider the Tensorflow Profiler available with Tensorboard as @Peter Mentioned.

Hope this helps :)

Upvotes: 0

Vijay Mariappan
Vijay Mariappan

Reputation: 17201

tf.config.experimental.get_memory_info('GPU:0')

Currently returns the following keys:

'current': The current memory used by the device, in bytes.
'peak': The peak memory used by the device across the run of the program, in bytes.

Upvotes: 0

eitanrich
eitanrich

Reputation: 321

Here's a practical solution that worked well for me:

Disable GPU memory pre-allocation using TF session configuration:

config = tf.ConfigProto()  
config.gpu_options.allow_growth=True  
sess = tf.Session(config=config)  

run nvidia-smi -l (or some other utility) to monitor GPU memory consumption.

Step through your code with the debugger until you see the unexpected GPU memory consumption.

Upvotes: 9

Steve
Steve

Reputation: 71

There's some code in tensorflow.contrib.memory_stats that will help with this:

from tensorflow.contrib.memory_stats.python.ops.memory_stats_ops import BytesInUse
with tf.device('/device:GPU:0'):  # Replace with device you are interested in
  bytes_in_use = BytesInUse()
with tf.Session() as sess:
  print(sess.run(bytes_in_use))

Upvotes: 7

Peter
Peter

Reputation: 108

The TensorFlow profiler has improved memory timeline that is based on real gpu memory allocator information https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/profiler#visualize-time-and-memory

Upvotes: 2

Yao Zhang
Yao Zhang

Reputation: 5781

(1) There is some limited support with Timeline for logging memory allocations. Here is an example for its usage:

    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    run_metadata = tf.RunMetadata()
    summary, _ = sess.run([merged, train_step],
                          feed_dict=feed_dict(True),
                          options=run_options,
                          run_metadata=run_metadata)
    train_writer.add_run_metadata(run_metadata, 'step%03d' % i)
    train_writer.add_summary(summary, i)
    print('Adding run metadata for', i)
    tl = timeline.Timeline(run_metadata.step_stats)
    print(tl.generate_chrome_trace_format(show_memory=True))
    trace_file = tf.gfile.Open(name='timeline', mode='w')
    trace_file.write(tl.generate_chrome_trace_format(show_memory=True))

You can give this code a try with the MNIST example (mnist with summaries)

This will generate a tracing file named timeline, which you can open with chrome://tracing. Note that this only gives an approximated GPU memory usage statistics. It basically simulated a GPU execution, but doesn't have access to the full graph metadata. It also can't know how many variables have been assigned to the GPU.

(2) For a very coarse measure of GPU memory usage, nvidia-smi will show the total device memory usage at the time you run the command.

nvprof can show the on-chip shared memory usage and register usage at the CUDA kernel level, but doesn't show the global/device memory usage.

Here is an example command: nvprof --print-gpu-trace matrixMul

And more details here: http://docs.nvidia.com/cuda/profiler-users-guide/#abstract

Upvotes: 12

Related Questions