How Tensorflow GPU/multi-GPU allocates memory?

Question

I have two questions:

(1) How does Tensorflow allocate GPU memory when using only one GPU? I have an implementation of convolution 2d like this (globally using GPU):

def _conv(self, name, x, filter_size, in_filters, out_filters, strides):
    with tf.variable_scope(name):
        n = filter_size * filter_size * out_filters
        kernel = tf.get_variable(
            '', [filter_size, filter_size, in_filters, out_filters], tf.float32,
            initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / n)),
        )
        return tf.nn.conv2d(x, kernel, strides, padding='SAME')
        # another option
        # x = tf.nn.conv2d(x, kernel, strides, padding='SAME')
        # return x

The another option in the comments does the same operation but have added a new variable x. In this case, will TF allocate more GPU memory?

(2) when using multiple GPUs. I'd like to use list for gathering the results from multiple GPUs. The implementation is below:

def _conv(self, name, input, filter_size, in_filters, out_filters, strides, trainable=True):
    assert type(input) is list
    assert len(input) == FLAGS.gpu_num

    n = filter_size * filter_size * out_filters
    output = []
    for i in range(len(input)):
        with tf.device('/gpu:%d' % i):
            with tf.variable_scope(name, reuse=i > 0):
                kernel = tf.get_variable(
                    '', [filter_size, filter_size, in_filters, out_filters], tf.float32,
                    initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / n))
                )
                output.append(tf.nn.conv2d(input[i], kernel, strides, padding='SAME'))

    return output

Will TF allocate more memory because of the usage of list? Is output (the list) attached to some GPU device? I have these kinds of questions because when I am using two GPUs to train the CNNs with this implementation, the program uses much more GPU memory than when using one GPU. I think there is something I missed or misunderstood.

LI Xuhong · Accepted Answer

Using this code to check each tensor and the attached device.

for n in tf.get_default_graph().as_graph_def().node:
    print n.name, n.device

So the answers for these two questions:

(1) No.

(2) If I'd like to gather the immediate data across GPUs, and the data are considered to compute the gradients, there would be problems. Because computing gradients consumes memory too. When accessing data across GPUs, additional memory will be allocated.

How Tensorflow GPU/multi-GPU allocates memory?

Answers (1)

Related Questions