westcoaststudent
westcoaststudent

Reputation: 67

parallel_for over a number of threads

I have limited experience with multithreading, and I'm currently looking at the pytorch code, where a for loop is parallelized using their custom implementation of parallel_for (it seems to be similarly defined in other codebases and in C++) here:

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp#L2747

My question is, why is it parallelizing over the number of threads? In most use cases where I see a for loop parallelized, it divides the domain (e.g., indices of an array), but here it is dividing the threads. Is this some standard way of multithreading?

Upvotes: 0

Views: 419

Answers (1)

MSalters
MSalters

Reputation: 180295

Sayy you want to have a parallel_for loop over 4000 items, and you have 2 CPU's (threads) available. You can choose an arbitrary domain of size 1000. Each thread now needs to process 2 of those domains. You've factored the problem into 2*2*1000.

If you don't choose an arbitrary domain, but let the thread count set the domain size, you factor the problem into 2*2000. This is a bit simpler; there's less overhead for the threads. Each thread gets a single domain.

Upvotes: 1

Related Questions