Yomu
Yomu

Reputation: 33

Tensorflow performance drop for second calculation

I am pretty new to Tensorflow2.0 and i was thinking of using its gpu processing features for some matrix calculations. So i tried it on some big matrix multiplications while measuring the performance. When I run it on one big matrix its very fast. But when i run it afterwards on other matrices it gets really slow. Also the initializing of very small Tensors is slow. Is this a problem because the matrices are using too much memory? But even if i delete the variables with pythons del the problem is still there.

My python code:

import tensorflow as tf
import numpy as np
import time


a = np.ones((9000,4000))
b = np.ones((4000,9000))

a2 = [a,a,a,a,a,a,a]
b2 = [b,b,b,b,b,b,b]

a3 = np.ones((7,9000,4000))
b3 = np.ones((7,4000,9000))

with tf.device('/gpu:0'):
    
    # first multiplication

    a2 = tf.convert_to_tensor(a)
    b2 = tf.convert_to_tensor(b)

    start = time.time()
    c = tf.matmul([b2,b2,b2,b2,b2,b2,b2], [a2,a2,a2,a2,a2,a2,a2])
    print("first multiplication time: ", time.time() - start)
    del c, a2, b2

    # second multiplication

    a3 = tf.convert_to_tensor(a3)
    b3 = tf.convert_to_tensor(b3)

    start = time.time()
    c = tf.matmul(b3, a3)
    print("second multiplication time: ", time.time() - start)
    del c, a3, b3

    # third multiplication

    start = time.time()
    n = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='n')
    m = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='m')
    print("constant init time: ",time.time() - start)

    c = tf.matmul([n,n], [m,m])
    print("constant init plus third multiplication time: ", time.time() - start)

The output (without tensorflow Information output)

first multiplication time:  0.7032458782196045
2021-02-07 20:40:36.004254: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
2021-02-07 20:40:36.588404: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
second multiplication time:  6.460264682769775
constant init time:  6.7629804611206055
constant init plus third multiplication time:  6.76327919960022

and when I uncomment the first multiplication the output changes to:

2021-02-07 20:44:29.165061: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
2021-02-07 20:44:29.763323: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
second multiplication time:  0.9040727615356445
constant init time:  7.273072242736816
constant init plus third multiplication time:  7.273530006408691

And when I only run the third calculation:

constant init time:  0.0499725341796875
constant init plus third multiplication time:  0.4284539222717285

I would really like to understand what is happening and maybe even find a way to improve it.

Thank you for your help!

Upvotes: 2

Views: 49

Answers (1)

Karan Dhingra
Karan Dhingra

Reputation: 133

It is happening because you are not transferring tensors back from GPU to CPU, so they take the GPU space. I am not sure about del, technically it should work in eager, but there was a bug related to the memory leak (not sure whether it's fixed or not).

If you call an extra function after tf.matmul

c = tf.matmul(b3, a3).numpy() // call numpy which copies it back to cpu

you should get the correct time,

first multiplication time:  8.76913070678711
second multiplication time:  8.516901731491089
constant init time:  0.0011458396911621094
constant init plus third multiplication time:  0.0024268627166748047

lemme know if anything's missing...

Upvotes: 1

Related Questions