Fabian N.
Fabian N.

Reputation: 3856

Theano is creating additional threads on devices not specified in configuration

So I have the following situation:

  1. computer with 3 graphic cards
  2. python script using Keras with Theano backend and multiple threads

I specified the device to use in .theanorc as described in the documentation.

The python script is of this form (still working on a standalone example):

import theano
from threading import Thread
...
class Test(Thread):

    def run(self):
        #calculations with Keras

test = Test()
test.start()
test.join()

Starting the script Theano uses the specified device but after some time a second python thread appears on one of the other graphic cards (and uses up resources).

The second thread seems to ignore the config as its running on the wrong GPU and isn't allocating ram as specified by the CNEM flag.

This should not be possible according to the documentation as everything that forks from the thread that started the Theano calculation should be running on the same device (ensured by importing Theano right at the beginning).

After some poking around I found out that this behavior stops when I don't run my Keras code in a separate thread.

So before I start creating Github issues I would like some pointers what's most likely:

  1. Is this a bug in Theano?
  2. Is this a bug in Keras?
  3. Is this a bug in my own code?

@3. My whole project doesn't create separate Python processes (confirmed over process list) and doesn't change any Theano configuration.

Any idea what could even cause this kind of behavior?

Upvotes: 0

Views: 179

Answers (1)

Huskar
Huskar

Reputation: 61

Device(gpu) setting of a thread is independent with other threads in the same process. Look this for more details.

I haven't found a way to set device for current thread in Theano. I use obsoleted cuda_ndarray back end, there is no way to do this, but I don't know if there is a way to do this in gpuarray back end.

I do some workaround:

import numpy as np
import theano
from theano import Apply
from theano import tensor as T
from theano.scalar import Scalar
from theano.sandbox.cuda import GpuOp, nvcc_compiler

class SetGpu(GpuOp):
    '''
    Set device(gpu) for current thread.
    '''

    def c_compiler(self):
        return nvcc_compiler.NVCC_compiler

    def make_node(self, gpu_id):
        dummy_out = Scalar("int32")()
        return Apply(self, [gpu_id], [dummy_out])

    def __str__(self):
        return "SetGpu"

    def c_support_code_apply(self, node, nodename):
        return ""

    def c_code(self, node, nodename, inps, outs, sub):
        gpu_id, = inps
        dummy_out, = outs
        return """
        int _gpu_id = *((int*)PyArray_DATA(%(gpu_id)s));
        %(dummy_out)s = _gpu_id;
        cudaError_t err = cudaSetDevice(_gpu_id);
        if(err != cudaSuccess){
            PyErr_Format(PyExc_RuntimeError, "Cuda err:\\"%%s\\" when calling cudaSetDevice(%%d).", cudaGetErrorString(err), _gpu_id);
            return 0;
        }
    """ % locals()

def set_gpu(gpu_id):
    if not hasattr(set_gpu, "f"):
        set_gpu_op = SetGpu()
        gpu_id_var = T.iscalar()
        dummy_out = set_gpu_op(gpu_id_var)
        set_gpu.f = theano.function([gpu_id_var], [dummy_out])
    _dummy_out = set_gpu.f(gpu_id)

if __name__ == "__main__":
    def test():
        set_gpu(5)
        print "Test thread is using gpu %d." % theano.sandbox.cuda.active_device_number()
    print "Main thread is using gpu %d." % theano.sandbox.cuda.active_device_number()
    from threading import Thread
    thread = Thread(target=test)
    thread.start()
    thread.join()

So let's call this file set_gpu.py

here is what I get running it:

python set_gpu.py 
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla K80 (CNMeM is enabled with initial size: 95.0% of memory, cuDNN 5110)
Main thread is using gpu 0.
Test thread is using gpu 5.

Upvotes: 2

Related Questions