How to free TF/Keras memory in Python after a model has been deleted, while other models are still in memory and in use?

Question

I have a Python server application, which provides TensorFlow / Keras model inference services. Multiple different such models can be loaded and used at the same time, for multiple different clients. A client can request to load another model, but this has no effect on the other clients (i.e. their models stay in memory and use as they are, so each client can ask to load another model regardless of the state of any other client).

The logic and implementation works, however, I am not sure how to correctly free memory in this setup. When a client asks for a new model to load, then the previously loaded model will simply be deleted from memory (via the Python del command), then the new model is being loaded via tensorflow.keras.models.load_model().

From what I read in the Keras documentation one might want to clear a Keras session in order to free memory via calling tf.keras.backend.clear_session(). However, that seems to release all TF memory, which is a problem in my case, since other Keras models for other clients are still in use at the same time, as described above.

Moreover, it seems I cannot put every model into their own process, since I cannot access the single GPU from different running processes in parallel (or at all).

So in other words: When loading a new TensorFlow / Keras model while other models are also in memory and in use, how can I free the TF memory from the previsouly loaded model, without interferring with the other currently loaded models?

Niteya Shah · Accepted Answer

When a Tensorflow session starts, it will try to allocate all of the GPU memory available. This is what prevents multiple processes from running sessions. The ideal way to stop this is to ensure that the tf session only allocates a part of the memory. From the docs, there are two ways to do this(Depending on your tf version)

The simple way is (tf 2.2+)

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
  tf.config.experimental.set_memory_growth(gpu, True)

for tf 2.0/2.1

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth(True)

for tf 1.* (Allocate 30% percentage of memory per process)

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

The other method is more controlled IMHO and scales better. It requires that you create logical devices and manually control placement for each of them.

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPU
    try:
        tf.config.experimental.set_virtual_device_configuration(
            gpus[0],
            [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024),
             tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)]
     except RuntimeError as e:
            # Virtual devices must be set before GPUs have been initialized
         print(e)

Now you have to manually control placement using the with

gpus = tf.config.experimental.list_logical_devices('GPU')
if gpus:
  # Replicate your computation on multiple GPUs
  c = []
  for gpu in gpus:
    with tf.device(gpu.name):
      a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
      b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
      c.append(tf.matmul(a, b))

  with tf.device('/CPU:0'):
    matmul_sum = tf.add_n(c)

  print(matmul_sum)

Using this you won't run of out memory and can run multiple processes at once.

How to free TF/Keras memory in Python after a model has been deleted, while other models are still in memory and in use?

Answers (2)

Related Questions