Reputation: 1087
I am training a neural network with tensorflow 2 (gpu) on my local machine, I'd like to do some tensorflow code in parallel (just loading a model and saving it's graph).
When loading the model I get a cuda error. How can I use tensorflow 2 on cpu to load and save a model, when another instance of tensorflow is training on the gpu?
132 self._config = config
133 self._hyperparams['feature_extractor'] = self._get_feature_extractor(hyperparams['feature_extractor'])
--> 134 self._input_shape_tensor = tf.constant([input_shape[0], input_shape[1]])
135 self._build(**self._hyperparams)
136 # save parameter dict for serialization
~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in constant(value, dtype, shape, name)
225 """
226 return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 227 allow_broadcast=True)
228
229
~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
233 ctx = context.context()
234 if ctx.executing_eagerly():
--> 235 t = convert_to_eager_tensor(value, ctx, dtype)
236 if shape is None:
237 return t
~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
93 except AttributeError:
94 dtype = dtypes.as_dtype(dtype).as_datatype_enum
---> 95 ctx.ensure_initialized()
96 return ops.EagerTensor(value, ctx.device_name, dtype)
97
~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/eager/context.py in ensure_initialized(self)
490 if self._default_is_async == ASYNC:
491 pywrap_tensorflow.TFE_ContextOptionsSetAsync(opts, True)
--> 492 self._context_handle = pywrap_tensorflow.TFE_NewContext(opts)
493 finally:
494 pywrap_tensorflow.TFE_DeleteContextOptions(opts)
InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
Upvotes: 0
Views: 1067
Reputation: 26
It took me a while to find this answer:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf
Starting your code with those lines allows you to run your tf code on CPU (avoid using CUDA is the solution, obviously) while at the same time running a heavy GPU loaded training.
Upvotes: 1
Reputation: 1489
By default TensorFlow 2 allocates 90% of your GPU:0 memory at the startup. If you set
import tensorflow as tf
tf.config.experimental.set_memory_growth(tf.config.experimental.list_physical_devices('GPU')[0], True)
you'll be able to use your GPU for both your tasks (of course, if your GPU has enough memory for that).
If you want more control on usage of GPU memory, you may create a virtual GPU with hard-coded video memory size:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 2 GB of memory on the first GPU
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)]) # limit in megabytes
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
Upvotes: 0
Reputation: 2507
You are loading the model on GPU and since it is already being used for training, it is running out of memory
. You need to place the loading onto the CPU. Try loading the model inside
with tf.device('/CPU:0'):
Upvotes: 0