Nick Skywalker
Nick Skywalker

Reputation: 1087

Using tensorflow when a session is already running on the gpu

I am training a neural network with tensorflow 2 (gpu) on my local machine, I'd like to do some tensorflow code in parallel (just loading a model and saving it's graph).

When loading the model I get a cuda error. How can I use tensorflow 2 on cpu to load and save a model, when another instance of tensorflow is training on the gpu?

    132         self._config = config
    133         self._hyperparams['feature_extractor'] = self._get_feature_extractor(hyperparams['feature_extractor'])
--> 134         self._input_shape_tensor = tf.constant([input_shape[0], input_shape[1]])
    135         self._build(**self._hyperparams)
    136         # save parameter dict for serialization

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in constant(value, dtype, shape, name)
    225   """
    226   return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 227                         allow_broadcast=True)
    228 
    229 

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    233   ctx = context.context()
    234   if ctx.executing_eagerly():
--> 235     t = convert_to_eager_tensor(value, ctx, dtype)
    236     if shape is None:
    237       return t

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
     93     except AttributeError:
     94       dtype = dtypes.as_dtype(dtype).as_datatype_enum
---> 95   ctx.ensure_initialized()
     96   return ops.EagerTensor(value, ctx.device_name, dtype)
     97 

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/eager/context.py in ensure_initialized(self)
    490         if self._default_is_async == ASYNC:
    491           pywrap_tensorflow.TFE_ContextOptionsSetAsync(opts, True)
--> 492         self._context_handle = pywrap_tensorflow.TFE_NewContext(opts)
    493       finally:
    494         pywrap_tensorflow.TFE_DeleteContextOptions(opts)

InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

Upvotes: 0

Views: 1067

Answers (3)

Selmor
Selmor

Reputation: 26

It took me a while to find this answer:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf

Starting your code with those lines allows you to run your tf code on CPU (avoid using CUDA is the solution, obviously) while at the same time running a heavy GPU loaded training.

Upvotes: 1

Vladimir Sotnikov
Vladimir Sotnikov

Reputation: 1489

By default TensorFlow 2 allocates 90% of your GPU:0 memory at the startup. If you set

import tensorflow as tf
tf.config.experimental.set_memory_growth(tf.config.experimental.list_physical_devices('GPU')[0], True)

you'll be able to use your GPU for both your tasks (of course, if your GPU has enough memory for that).
If you want more control on usage of GPU memory, you may create a virtual GPU with hard-coded video memory size:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 2 GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)]) # limit in megabytes
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Upvotes: 0

Rishabh Sahrawat
Rishabh Sahrawat

Reputation: 2507

You are loading the model on GPU and since it is already being used for training, it is running out of memory. You need to place the loading onto the CPU. Try loading the model inside

with tf.device('/CPU:0'):

Upvotes: 0

Related Questions