Reputation: 75
After asking another question about the MapReduce algorithm, this has got me thinking about how to determine the most efficient way of arriving at a total sum of n values using parallel processing. The problem can be simplified as follows:
Suppose I have n processors, each holding an integer. I want to determine the sum of the integers as quickly as possible.
Now I could get each processor 2,..,n to pass its integer to processor 1. Processor 1 then adds up each of the numbers in turn to produce the result. This means n - 1 passes of data, but these can all happen in parallel. This is followed by n - 1 addition operations, occurring in sequence.
Alternatively, I could get each odd numbered processor to pass its integer to the next even numbered processor (let's assume n is even, for the sake of argument). Each even numbered processor then performs one addition operation, in parallel, adding its own number to the one it's just been passed. We then end up with 1/2 n integers to add together. We could then use the previous method to add the remaining values.
Of course, there are many other ways of doing this. How do I determine which is the most efficient? I suspect that it depends on the relative cost of an addition operation vs passing an integer (in real life, think CPU vs network speed), and probably also on the size of n. After all, if n is very large, then adding an extra network hop in order to halve n may be worth it, even if each addition is relatively cheap.
Upvotes: 0
Views: 955
Reputation: 78316
This is more of a comment than an answer, but that little box is so confining ...
Define most efficient. Are you concerned with theoretical efficiency or speed in practice ?
I think you're asking yourself the right questions, and you seem to have realised that if you have 100,000 processors each with 1 integer then the critical resource is communications speed not computation speed. For any scheme you devise to sum N
integers starting with N
processors bear in mind that the communications time will be dominated not by the bandwidth (time to send 1 integer) but by the latency (time to send a 0 size message). For most practical purposes I expect that issue to kill your fancy schemes dead.
And another question: where did the integers originate ? If they originated on one process(or) and were distributed to the other N-1
you've almost certainly wasted more time sending them out than it would have taken the 1st process(or) to calculate the sum. If the integers originated as, perhaps, the result of a process running on each processor then, no matter the (in)efficiency, you'll have to do some kind of reduction anyway and pay the communications cost.
In practice you're only going to get a boost in speed when calculating the sum of N
integers on p
processors when N
is much larger than p
. To figure this out for your numbers on your parallel computer there's no replacement for experimentation.
Upvotes: 1