Reputation: 672
Let's imagine that we have T number of threads and we want to distribute a problem of size N to those threads. Every thread will chose a part of that problem to execute it. Each thread will use the thread_id (a number from 0 to T-1), the T and the N in order to calculate the range of the sub-problem. Let's say that the range of the sub-problem is [S, E) where S and E belong to [0, N].
For example. Let's say we have an array of integers. The size of the array is 10. We want increase each element of that array by one and we want to do that in parallel using 4 threads.
Does anyone know a fast algorithm that will calculate those ranges? Preferably without atomics or branches.
Upvotes: 1
Views: 950
Reputation: 16090
I think these two examples should work. All operations are integer. Except the one marked that it's not.
This one has simpler logic, however it does not distribute work as you require. It would assign the bigger chunk of work to all workers except the last one which would get significantly lower share. It shouldn't be a problem in theory as the maximal amount of work for one worker stays the same.
items_per_thread = ceil(N/T); // This is not an integer division.
start = thread_id*items_per_thread;
stop = min(start+items_per_thread, N);
This one should distribute work as you require.
items_per_thread = N/T;
start = thread_id*items_per_thread+min(thread_num, N mod T);
stop = start+items_per_thread;
if(thread_num < N mod T) stop += 1;
I don't think it's possible to avoid branches.
Feeling adventurous I made a live demo in python, it includes ciamej's method too.:
import math
def distribution1(id ,N, T):
items_per_thread = math.ceil(N/T);
start = id*items_per_thread;
stop = min(start+items_per_thread, N);
return (start, stop)
def distribution2(id ,N, T):
items_per_thread = math.floor(N/T);
start = id*items_per_thread+min(id, N % T);
stop = start+items_per_thread;
if(id < N % T): stop += 1;
return (start, stop)
def distribution3(id ,N, T):
S = math.floor(id * N/T)
E = math.floor((id + 1) * N/T)
return (S,E)
def distribute(N, T, method):
ret = []
for i in range(T):
ret.append(method(i, N, T))
return ret
N=10
T=4
print(distribute(N, T, distribution1))
print(distribute(N, T, distribution2))
print(distribute(N, T, distribution3))
Output:
[(0, 3), (3, 6), (6, 9), (9, 10)]
[(0, 3), (3, 6), (6, 8), (8, 10)]
[(0, 2), (2, 5), (5, 7), (7, 10)]
Upvotes: 1
Reputation: 7068
If I understand correctly you're looking for such an equation?
S = floor(thread_id * N/T)
E = floor((thread_id + 1) * N/T)
If you multiply first (threadId * N
) and divide later (/N
) you can use integers for the computations and floor
function is not necessary.
Upvotes: 3