Reputation: 67
I have limited experience with multithreading, and I'm currently looking at the pytorch code, where a for loop is parallelized using their custom implementation of parallel_for
(it seems to be similarly defined in other codebases and in C++) here:
My question is, why is it parallelizing over the number of threads? In most use cases where I see a for loop parallelized, it divides the domain (e.g., indices of an array), but here it is dividing the threads. Is this some standard way of multithreading?
Upvotes: 0
Views: 419
Reputation: 180295
Sayy you want to have a parallel_for
loop over 4000 items, and you have 2 CPU's (threads) available. You can choose an arbitrary domain of size 1000. Each thread now needs to process 2 of those domains. You've factored the problem into 2*2*1000
.
If you don't choose an arbitrary domain, but let the thread count set the domain size, you factor the problem into 2*2000
. This is a bit simpler; there's less overhead for the threads. Each thread gets a single domain.
Upvotes: 1