How do I estimate the GPU footprint of my TensorFlow model?

Question

I'm trying to get a rough handle on the GPU memory footprint of my TensorFlow deep learning models, and am relying on a heuristic I've found with suggests:

The largest bottleneck to be aware of when constructing ConvNet architectures is the memory bottleneck. Many modern GPUs have a limit of 3/4/6GB memory, with the best GPUs having about 12GB of memory. There are three major sources of memory to keep track of:

From the intermediate volume sizes: These are the raw number of activations at every layer of the ConvNet, and also their gradients (of equal size). Usually, most of the activations are on the earlier layers of a ConvNet (i.e. first Conv Layers). These are kept around because they are needed for backpropagation, but a clever implementation that runs a ConvNet only at test time could in principle reduce this by a huge amount, by only storing the current activations at any layer and discarding the previous activations on layers below.

From the parameter sizes: These are the numbers that hold the network parameters, their gradients during backpropagation, and commonly also a step cache if the optimization is using momentum, Adagrad, or RMSProp. Therefore, the memory to store the parameter vector alone must usually be multiplied by a factor of at least 3 or so.

Every ConvNet implementation has to maintain miscellaneous memory, such as the image data batches, perhaps their augmented versions, etc.

Once you have a rough estimate of the total number of values (for activations, gradients, and misc), the number should be converted to size in GB. Take the number of values, multiply by 4 to get the raw number of bytes (since every floating point is 4 bytes, or maybe by 8 for double precision), and then divide by 1024 multiple times to get the amount of memory in KB, MB, and finally GB. If your network doesn’t fit, a common heuristic to “make it fit” is to decrease the batch size, since most of the memory is usually consumed by the activations.

But I'm unsure of a few things:

What is the role of batch size in this calculation? It sounds like it only affects activations (that is, I should multiply activations by batch size). Is that correct?
How do I know which of these things sits on my GPU (with 12GiB) and which on my CPUs RAM (in TensorFlow)? In general to pretty much all of these reside on the GPU?
Where should I look for "miscellaneous"? Is the input data the main source of this. And if so, do I count a single batch of such data or all of it?

How do I estimate the GPU footprint of my TensorFlow model?

Answers (1)

Related Questions