Mohammed AlQuraishi
Mohammed AlQuraishi

Reputation: 1557

Restricting number of cores used

I'm trying to restrict the number of cores that a tf session uses but it's not working. This is how I'm initializing the session:

sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=1,
                                        intra_op_parallelism_threads=1,
                                        use_per_session_threads=True))

The system has 12 cores / 24 threads, and I can see that 40-60% of them are being used at any given point in time. The system also has 8 GPUs, but I construct the whole graph with tf.device('/cpu:0').

UPDATE: To clarify, the graph itself is a simple LSTM-RNN, that hews very closely to the examples in the tf source code. For completeness here's the full graph:

node_input  = tf.placeholder(tf.float32, [n_steps, batch_size, input_size],  name = 'input')
list_input  = [tf.reshape(i, (batch_size, input_size)) for i in tf.split(0, n_steps, node_input)]
node_target = tf.placeholder(tf.float32, [n_steps, batch_size, output_size], name = 'target')
node_target_flattened = tf.reshape(tf.transpose(node_target, perm = [1, 0, 2]), [-1, output_size])
node_max_length = tf.placeholder(tf.int32, name = 'batch_max_length')
node_cell_initializer = tf.random_uniform_initializer(-0.1, 0.1) 
node_cell = LSTMCell(state_size, input_size, initializer = node_cell_initializer)  
node_initial_state = node_cell.zero_state(batch_size, tf.float32)
nodes_output, nodes_state = rnn(node_cell, 
                                list_input, 
                                initial_state = node_initial_state, 
                                sequence_length = node_max_length)
node_output_flattened = tf.reshape(tf.concat(1, nodes_output), [-1, state_size])
node_softmax_w = tf.Variable(tf.random_uniform([state_size, output_size]), name = 'softmax_w')
node_softmax_b = tf.Variable(tf.zeros([output_size]), name = 'softmax_b')
node_logit = tf.matmul(node_output_flattened, node_softmax_w) + node_softmax_b
node_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(node_logit, node_target_flattened, name = 'cross_entropy')
node_loss = tf.reduce_mean(node_cross_entropy, name = 'loss')
node_optimizer = tf.train.AdamOptimizer().minimize(node_loss)
node_op_initializer = tf.initialize_all_variables()

One important thing to note is that if the first time I call tf.Session, I pass in the appropriate parameters, then the session does only run on a single core. The problem is that in subsequent runs, I am unable to change the behavior, even though I use use_per_session_threads which is supposed to specifically allow for session-specific settings. I.e. even after I close the session using sess.close() and start a new one with new options, the original behavior remains unchanged unless I restart the python kernel (which is very costly because it takes it nearly an hour to load my data).

Upvotes: 6

Views: 3738

Answers (3)

gregoruar
gregoruar

Reputation: 383

In tensorflow 2.3.2 I managed to limit cpus by using psutils lib. I provided this the beginning of the function

pid = psutil.Process(os.getpid())
pid.cpu_affinity([0, 1])

The later call of model.fit utilized exactly 2 cpus

Upvotes: 0

Albert
Albert

Reputation: 68330

use_per_session_threads will only affect the inter_op_parallelism_threads but not the intra_op_parallelism_threads. The intra_op_parallelism_threads will be used for the Eigen thread pool (see here) which is always global, thus subsequent sessions will not influence this anymore.

Note that there are other TF functions which can also trigger the initialization of the Eigen thread pool, so it can happen that it's already initialized before you create the first tf.Session. One example is tensorflow.python.client.device_lib.list_local_devices().

I solve this in a way that very early in my Python script, I create a dummy session with the appropriate values.

Upvotes: 1

Alexandre Passos
Alexandre Passos

Reputation: 5206

TensorFlow does an optimization where the first time a DirectSession is created it will create static thread pools which then will be reused. If you want to change this, specify multiple different thread pools in the session_inter_op_thread_pool flag and specify which one you want to use.

Upvotes: 0

Related Questions