parallel_for over a number of threads

Question

I have limited experience with multithreading, and I'm currently looking at the pytorch code, where a for loop is parallelized using their custom implementation of parallel_for (it seems to be similarly defined in other codebases and in C++) here:

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp#L2747

My question is, why is it parallelizing over the number of threads? In most use cases where I see a for loop parallelized, it divides the domain (e.g., indices of an array), but here it is dividing the threads. Is this some standard way of multithreading?

parallel_for over a number of threads

Answers (1)

Related Questions