Tensorflow with XLA doesn't fully utilize CPU capacity

Question

I have created a Monte-Carlo simulation model implemented in Tensorflow 2.5. The model mostly consists of vector multiplications inside a tf.while_loop. I am benchmarking the performance on a Linux machine with 8 virtual CPUs. When I run the model in graph mode (without XLA optimization), the model fully utilizes all 8 CPUs (I can see the %CPU to be close to 800% using the top command). However, when I run model after compiling with XLA (by using jit_compile=True inside @tf.function decorator), I can see the %CPU utilization to be close to 250%. Is there a way to force Tensorflow to utilize all available CPU capacity with XLA.

I have experimented with the changing the inter_op_parallelism and intra_op_parallelism settings. While setting both of the threads settings to 1 reduces the CPU utilization from 250% to 100%, increasing them to 8 doesn't increase the utilization beyond 250%.

Any help and suggestions on what might be going on?

Tensorflow with XLA doesn't fully utilize CPU capacity

Answers (1)

Related Questions

Tensorflow with XLA doesn&#39;t fully utilize CPU capacity

Answers (1)

Related Questions

Tensorflow with XLA doesn't fully utilize CPU capacity