oferlivny
oferlivny

Reputation: 300

Change number of threads for Tensorflow inference with C API

I'm writing a c++ wrapper around tensorflow 1.2 C API (for inference purposes, if it matters). Since my application is a multi-process and multi-threaded one, where resources are explicitly allocated, I would like to limit Tensorflow to only use one thread.

Currently, running a simple inference test that allows batch processing, I see it is using all CPU cores. I have tried limiting number of threads for a new session using a mixture of C and C++ as follows (forgive my partial code snippet, I hope this makes sense):

tensorflow::ConfigProto conf;
conf.set_intra_op_parallelism_threads(1);
conf.set_inter_op_parallelism_threads(1);
conf.add_session_inter_op_thread_pool()->set_num_threads(1);
std::string str;
conf.SerializeToString(&str);
TF_SetConfig(m_session_opts,(void *)str.c_str(),str.size(),m_status);
m_session = TF_NewSession(m_graph, m_session_opts, m_status);

But I don't see it is making any difference - all cores are still fully utilized.

Am I using the C API correctly?

(My current work around is to recompile Tensorflow with hard coding number of threads to be 1, which will probably work, but its obviously not the best approach...)

-- Update --

I also tried adding:

conf.set_use_per_session_threads(true);

Without success. Still multiple cores are used...

I also tried to run with high log verbosity, and got this output (showing only what I think is relevant):

tensorflow/core/common_runtime/local_device.cc:40] Local device intraop parallelism threads: 8
tensorflow/core/common_runtime/session_factory.cc:75] SessionFactory type DIRECT_SESSION accepts target: 
tensorflow/core/common_runtime/direct_session.cc:95] Direct session inter op parallelism threads for pool 0: 1

The "parallelism threads: 8" message shows up as soon as I instantiate a new graph using TF_NewGraph(). I didn't find a way to specify options prior to this graph allocation though...

Upvotes: 3

Views: 5639

Answers (2)

fisakhan
fisakhan

Reputation: 802

There is no problem in your use of TensorFlow C API. It is the limitation of C API to generate at least N number of threads, where N is the number of cores. You can't reduce it further.

Setting OMP_NUM_THREADS from environment can change the number of threads but TensorFlow overwrites those setting and generate N threads.

However, you can specify one or more cores to process. taskset, numatcl and setting thread affinity can lock a given process to a given core but don't change the number of threads.

The above mentioned solutions will reduce the total number of threads (but not to 1) generated by TensorFlow. The total number of threads generated by TensorFlow will still be multiple, depending on the number of cores in the CPU. In most of the cases, only one thread will be active while others will be in sleeping mode. I don't think it is possible to have a single-threaded TensorFlow.

The following github issues support my point of view. https://github.com/tensorflow/tensorflow/issues/33627 https://github.com/usnistgov/frvt/issues/30

Building TensorFlow C++ API from source and changing the source code may help.

Upvotes: 0

Mike S.
Mike S.

Reputation: 118

I had the same problem and solved it by setting the number of threads when creating the first TF session my application is creating. If the first created session is not created with a options object TF will create worker threads as the number of cores on the machine * 2.

Here is the C++ code I used:

// Call when application starts
void InitThreads(int coresToUse)
{
    // initialize the number of worker threads
    tensorflow::SessionOptions options;
    tensorflow::ConfigProto & config = options.config;
    if (coresToUse > 0)
    {
        config.set_inter_op_parallelism_threads(coresToUse);
        config.set_intra_op_parallelism_threads(coresToUse);
        config.set_use_per_session_threads(false);  
    }
    // now create a session to make the change
    std::unique_ptr<tensorflow::Session> 
        session(tensorflow::NewSession(options));
    session->Close();
}

Pass 1 to limit the number of inter & intra threads to 1 each.

Edit: IMPORTANT NOTE: This code works when called from the main application (google sample trainer) BUT stopped working when I moved it to a DLL dedicated to wrap tensorFlow). TF 1.4.1 ignores the parameter I pass and spins up all threads. I would like to hear your comments...

Upvotes: 4

Related Questions