Reputation: 161
I'm currently running a pytorch model which periodically calls out to a tensorflow model for benchmarking purposes. I'd like both of these models to be GPU-enabled and to run in the same script. Since tensorflow benchmarking code claims GPU memory til the end of the process, I've elected to run the benchmarking code in a multiprocessing.Process
so that my pytorch model can use the full GPU's memory after the benchmarking script has run.
During this, I've stumbled across an unusual bug (?) in tensorflow's gpu utilization. It seems that tensorflow run in a subprocess doesn't want to use a GPU which has been used ~at all~ by a parent process. I can have tensorflow models and pytorch models in the same GPU and process with no problems, but when I introduce subprocesses tensorflow is ill-behaved.
I'm running
tensorflow-gpu==1.14.0
torch==1.1.0
cudatoolkit=10.0
on an NVIDIA 2080-Ti.
Below is a minimal code snipped to reproduce:
import torch
import tensorflow as tf
from multiprocessing import Process
def f():
print(tf.test.is_gpu_available())
pa = Process(target=f, args=())
pa.start()
pa.join()
torch.ones(1).cuda()
pb = Process(target=f, args=())
pb.start()
pb.join()
>>> True
>>> False
Upvotes: 0
Views: 703
Reputation: 161
To anyone running into this problem, you need to call multiprocessing.set_start_method('spawn')
. Tensorflow is not fork-safe and some weirdness can happen with global variables/modules that is probably very hard to reason about. Remember to call it only once, inside a if __name__ == '__main__':
check.
Upvotes: 2