Lana
Lana

Reputation: 51

Tensorflow: Memory growth cannot differ between GPU devices | How to use multi-GPU with tensorflow

I am trying to run a keras code on a GPU node within a cluster. The GPU node has 4 GPUs per node. I made sure to have all 4 GPUs within the GPU node available for my use. I run the code below to let tensorflow use the GPU:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        print(e)
        

The 4 GPUs available get listed in the output. However, I got the following error when running the code:

Traceback (most recent call last):
  File "/BayesOptimization.py", line 20, in <module>
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
  File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/framework/config.py", line 439, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1368, in list_logical_devices
    self.ensure_initialized()
  File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 511, in ensure_initialized
    config_str = self.config.SerializeToString()
  File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1015, in config
    gpu_options = self._compute_gpu_options()
  File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1074, in _compute_gpu_options
    raise ValueError("Memory growth cannot differ between GPU devices")
ValueError: Memory growth cannot differ between GPU devices

Shouldn't the code list all the available gpus and set memory growth to true for each one?

I am currently using tensorflow libraries and python 3.97:

tensorflow                2.4.1           gpu_py39h8236f22_0
tensorflow-base           2.4.1           gpu_py39h29c2da4_0
tensorflow-estimator      2.4.1              pyheb71bc4_0
tensorflow-gpu            2.4.1                h30adc30_0

Any idea what the problem is and how to solve it? Thanks in advance!

Upvotes: 5

Views: 5806

Answers (3)

Sami Wood
Sami Wood

Reputation: 525

I think its just a matter of being careful with indentations. For me your code does not work:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        print(e)

BUT if you wait before calling logical_gpus = tf.config.list_logical_devices('GPU') until the end then it works. SO change to

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        print(e)

I think what is happening is you need to set ALL VISIBLE GPUs in the config before calling any other accessors to the config. So your other option is to call tf.config.set_visible_devices FIRST. If you limit to one GPU you dont need the loop if you do two or more you need to be careful to call set_memory_growth on all of them before accessing the config or running any tf code.

Upvotes: 0

ykgautam09
ykgautam09

Reputation: 1

In case of multi-GPU devices memory growth should be constant through out all available GPUs. Either set it true for all GPUs or keep it false.

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

Tensorflow GPU documentation

Upvotes: 0

Kazekiri
Kazekiri

Reputation: 21

Try only: os.environ["CUDA_VISIBLE_DEVICES"]="0" instead of tf.config.experimental.set_memory_growth. This works for me.

Upvotes: 2

Related Questions