Reputation: 1557
I'm trying to restrict the number of cores that a tf session uses but it's not working. This is how I'm initializing the session:
sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=1,
intra_op_parallelism_threads=1,
use_per_session_threads=True))
The system has 12 cores / 24 threads, and I can see that 40-60% of them are being used at any given point in time. The system also has 8 GPUs, but I construct the whole graph with tf.device('/cpu:0')
.
UPDATE: To clarify, the graph itself is a simple LSTM-RNN, that hews very closely to the examples in the tf source code. For completeness here's the full graph:
node_input = tf.placeholder(tf.float32, [n_steps, batch_size, input_size], name = 'input')
list_input = [tf.reshape(i, (batch_size, input_size)) for i in tf.split(0, n_steps, node_input)]
node_target = tf.placeholder(tf.float32, [n_steps, batch_size, output_size], name = 'target')
node_target_flattened = tf.reshape(tf.transpose(node_target, perm = [1, 0, 2]), [-1, output_size])
node_max_length = tf.placeholder(tf.int32, name = 'batch_max_length')
node_cell_initializer = tf.random_uniform_initializer(-0.1, 0.1)
node_cell = LSTMCell(state_size, input_size, initializer = node_cell_initializer)
node_initial_state = node_cell.zero_state(batch_size, tf.float32)
nodes_output, nodes_state = rnn(node_cell,
list_input,
initial_state = node_initial_state,
sequence_length = node_max_length)
node_output_flattened = tf.reshape(tf.concat(1, nodes_output), [-1, state_size])
node_softmax_w = tf.Variable(tf.random_uniform([state_size, output_size]), name = 'softmax_w')
node_softmax_b = tf.Variable(tf.zeros([output_size]), name = 'softmax_b')
node_logit = tf.matmul(node_output_flattened, node_softmax_w) + node_softmax_b
node_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(node_logit, node_target_flattened, name = 'cross_entropy')
node_loss = tf.reduce_mean(node_cross_entropy, name = 'loss')
node_optimizer = tf.train.AdamOptimizer().minimize(node_loss)
node_op_initializer = tf.initialize_all_variables()
One important thing to note is that if the first time I call tf.Session
, I pass in the appropriate parameters, then the session does only run on a single core. The problem is that in subsequent runs, I am unable to change the behavior, even though I use use_per_session_threads
which is supposed to specifically allow for session-specific settings. I.e. even after I close the session using sess.close()
and start a new one with new options, the original behavior remains unchanged unless I restart the python kernel (which is very costly because it takes it nearly an hour to load my data).
Upvotes: 6
Views: 3738
Reputation: 383
In tensorflow 2.3.2 I managed to limit cpus by using psutils
lib.
I provided this the beginning of the function
pid = psutil.Process(os.getpid())
pid.cpu_affinity([0, 1])
The later call of model.fit
utilized exactly 2 cpus
Upvotes: 0
Reputation: 68330
use_per_session_threads
will only affect the inter_op_parallelism_threads
but not the intra_op_parallelism_threads
. The intra_op_parallelism_threads
will be used for the Eigen thread pool (see here) which is always global, thus subsequent sessions will not influence this anymore.
Note that there are other TF functions which can also trigger the initialization of the Eigen thread pool, so it can happen that it's already initialized before you create the first tf.Session
. One example is tensorflow.python.client.device_lib.list_local_devices()
.
I solve this in a way that very early in my Python script, I create a dummy session with the appropriate values.
Upvotes: 1
Reputation: 5206
TensorFlow does an optimization where the first time a DirectSession
is created it will create static thread pools which then will be reused. If you want to change this, specify multiple different thread pools in the session_inter_op_thread_pool flag and specify which one you want to use.
Upvotes: 0