j_0101
j_0101

Reputation: 167

How does Multi-GPU scale in terms of memory allocation?

I have a PC with the following specs:

My question is, when I run my training program using Keras on roughly 60k images (GPU:1), the program loads the images and the data matrix is 12922.20MB

screenshot of program

After this, the program doesn't do anything for a minute and is killed automatically. The same code seems to be training on GPU:1 and working fine with 10k images.

  1. Could this be because my GPU:1 can store only 11GB and the the data size is around 12GB?
  2. Would parallelising GPU:1 and GPU:0 solve my problem? If so, would it be 16GB VRAM(8+8) or 19GB (11+8)?
  3. Am I doing something wrong? The post I am referring is: https://www.pyimagesearch.com/2018/09/10/keras-tutorial-how-to-get-started-with-keras-deep-learning-and-python with some minor modifications.

I did try to search online and on SO but I couldnt find/understand much information on how GPU memory is allocated/scales when using Multi-GPU with Keras.

Any help would be appreciated!

Upvotes: 2

Views: 1723

Answers (1)

Timbus Calin
Timbus Calin

Reputation: 15033

I would first recommend that you check the memory usage when training on a single GPU; I suspect that your dataset is not loaded into the GPU memory but into the RAM.

You can try to set:

1.

  import os
  #Enable system to see only one of the video cards
  os.environ["CUDA_VISIBLE_DEVICES"] = "0"/"1" 

Check to see the exact mapping(that tensorflow sees your GPU):

tf.config.list_physical_devices('GPU') 
  1. Now verify exactly how much VRAM is used in this case:

Then, in the terminal you can use nvidia-smi to check how much GPU memory has been alloted; at the same time, using watch -n K nvidia-smi

  1. When you use multi-gpus, ensure that you use tf.distribute.MirroredStrategy() and declare your model creation+fit logic like below:

     strategy = tf.distribute.MirroredStrategy()
     print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
    
     # Open a strategy scope.
     with strategy.scope():
       # Everything that creates variables should be under the strategy scope.
       # In general this is only model construction & `compile()`.
       model = Model(...)
       model.compile(...)
    

Outside the strategy scope

model.fit(train_dataset, validation_data=val_dataset, ...)

model.evaluate(test_dataset)

Upvotes: 1

Related Questions