arraysmultithreadingalgorithmlanguage-agnosticparallel-processing

Reputation: 834

Addition of elements of Two Arrays

You have Two arrays

int[] a = {......} // Total elements 1 million
int[] b = {......} // Total elements 1 million , Length is same for both arrays.

Q1. I have to create an array

int[] c

whose elements are sum of corresponding indexes of a[] and b [] Ex.

 c[0] = a[0] + b[0];
 c[1] = a[1] + b[1];
 c[2] = a[2] + b[2];

Solution : -> I can take advantage of Multithreading. Divide the whole array in 10 or more parts and assign each segment to a thread to perform the calculation. Note-> Interviewer suggested to use Multithreading

Q2. Now its little bit changed.Elements of array C will have values like this :->

c[0] = a[0] + b[0];
c[1] = a[1] + b[1] + c[0]; // At this line c[0] is Sum of a[0] + b[0]; The Above Line
c[2] = a[2] + b[2] + c[1]; // At this line c[0] is Sum of a[1] + b[1]+ a[0] + b[0]; The Above Line

MySolution-> Solve Part 1 (Q1) and create a temporary array and after that we have to perform addition like this.

C[1] = temp[1]+temp[0]
C[2] = temp[2]+temp[1]

Note :-> We really don't need temp[], we can do this only using Array c Also. Just to explain this on SO in easy way.

Problem-> I dont think that in question 2 we can use multithreding. Is there a better way to solve Q2 ? Can we take advantage of multithreading in this.

Upvotes: 6

Answers (5)

Ian Cook

Reputation: 1196

You can do part 1 multi-threaded as you say giving -

temp[0] = a[0] + b[0];
temp[1] = a[1] + b[1];
temp[2] = a[2] + b[2];
etc....

Then the calculation for part 2 becomes -

c[0] = temp[0];
c[1] = temp[1] + temp[0];
c[2] = temp[2] + temp[1] + temp[0];
c[3] = temp[3] + temp[2] + temp[1] + temp[0];
etc...

Although this looks sequential and impossible to parallelize, it is in fact quite a common operation called a 'prefix sum' or 'scan'. For more details, including how to parallelize, see Wikipedia or Blelloch.

In the 8 elements case this becomes the following where each recursive phase can be parallelized as each calculation has no dependency on other calculations in the same phase.

// 1st phase 
u[0] = temp[0] + temp[1];
u[1] = temp[2] + temp[3];
u[2] = temp[4] + temp[5];
u[3] = temp[6] + temp[7];

// 2nd phase
v[0] = u[0] + u[1];
v[1] = u[2] + u[3];

// 3rd phase
w[0] = v[0] + v[1];

// final phase
c[0] = temp[0];
c[1] = u[0];
c[2] = u[0] + temp[2];
c[3] = v[0];
c[4] = v[0] + temp[4];
c[5] = v[0] + u[2];
c[6] = v[0] + u[2] + temp[6];
c[7] = w[0];

Upvotes: 4

Grijesh Chauhan

Reputation: 58281

In my opinion for question two you have two techniques:

First should be done in two steps. step-1 using threads you can add

 c[0] = a[0] + b[0];
 c[1] = a[1] + b[1];
 c[2] = a[2] + b[2];

as you suggested.

But step-2 should be done sequentially. because c[ i + 1] value depends on updated value of c [i]

Second technique is bit complex but can be fast.

What you are asked to do in second question is do something like:

 c[0] = a[0] + b[0];
 c[1] = a[1] + b[1] + a[0] + b[0];
 c[2] = a[2] + b[2] + a[1] + b[1] + a[0] + b[0];

This can be parallel.

c[i] =  thread1( sum(a[0]...a[i] )) + thread2( sum(b[0]...b[i] ))
         for i >= 0

Good is in this technique, you can calculate c[i] parallely for all i (its two like level threaded model).

You can further improve thread1/thread2 functions as multithread with child threads to perform sum – but remember sometime multi-threaded code runs slow then single-threaded code because of thread context-switching time (So you should give sufficient amount of task to each thread).

A point that is unlike about second technique is "most of what the threads for c[i] do is the same as what the threads for c[i-1] do as well".
Thanks to @jogojapan to let me know about this drawback point.

For better technique read updated answer by Mr.Jogojapan.

Upvotes: 5

jogojapan

Reputation: 70037

One way of using multi-threading for Q2 is to do it in two passes (each using T threads internally, where T can be chosen freely):

Compute c[i] = a[i] + b[i] + c[i-1] for all cells in the usual multi-threaded fashion, i.e. after dividing the input into ranges [0,r1), [r1,r2), ... [rk,n) and applying one thread to each range. Yes, this will be incorrect for all ranges except the first one, and step 2 will correct it.
Compute corrections, again in a multi-threaded fashion. For this, we look up the right-most value of each range, i.e. corr1:=c[r1-1], corr2:=corr1+c[r2-1], corr3:=corr2+c[r3-1] etc., which gives us the correcting value for each thread, and then calculate, again using multi-threading with the same ranges as before, c[i] += corrk where corrk is the thread-specific correcting value for the k-th thread. (For the zeroth thread, we can use corr0:=0, so that thread doesn't need to do anything.)

This improves the theoretical running time by a factor T where T is the number of threads (which can be chosen freely), so it's an optimal solution as far as multithreading is concerned.

To illustrate how this works, here is an example where we assume arrays of length n==30. We further assume that we use 3 threads: One to compute the range c[0..9], one for c[10..19] and one for c[20..29].

Clearly, the goal is that in cell c[i], for any 0<i<n, we get

c[i] == a[0]+...+a[i]+b[0]+...+b[i]

(i.e., the sum of all a[0..i] and all b[0..i]) after the algorithm has finished. Let's have a look at how the algorithm gets there, for an example cell i==23. This cell is handled by the third thread, i.e. the thread responsible for the range c[20..29].

Step 1: The thread sets

c[20] = a[20]+b[20]
c[21] = c[20]+a[21]+b[21] == a[20]+a[21]+b[20]+b[21]
c[22] = c[21]+a[22]+b[22] == a[20]+a[21]+a[22]+b[20]+b[21]+b[22]
c[23] = c[22]+a[23]+b[23] == a[20]+a[21]+a[22]+a[23]+b[20]+b[21]+b[22]+b[23]
...

So, after step 1 has finished, we have the some of a[20..23] and b[20..23] in cell c[23]. What is missing, is the sum of a[0..19] and b[0..19].

Similarly, the first and second threads have set the values in c[0..9] and c[10..19] such that

c[0] = a[0]+b[0]
c[1] = c[0]+a[1]+b[1] == a[0]+a[1]+b[0]+b[1]
...
c[9] = a[0]+...+a[9]+b[0]+...+b[9]

and

c[10] = a[10]+b[10]
...
c[19] = a[10]+...+a[19]+b[10]+...+b[19]

Step 2: The correcting value for the third thread, corr2 is the sum of corr1 and the right-most value computed by the second thread, while corr1 is the right-most value computed by first thread. Hence

corr2 == c[9]+c[19] == (a[0]+...+a[9]+b[0]+...+b[9]) + (a[10]+...+a[19]+b[10]+...+b[19])

And this is indeed the sum missing from the value of c[23] after step 1. In step 2, we add this value to all elements c[20..29], hence, after step 2 has finished, c[23] is correct (and so are all other cells).

The reason why this works is that the calculation of the cell values is a left-to-right, strictly incremental operation, and the order of operations for the calcuation of a single cell doesn't matter (because + is associative and commutative). Hence the end-result ("right-most value") of any given thread after step 1 can be used to correct the results of threads responsible for the ranges to its right in step 2.

Upvotes: 0