SU3
SU3

Reputation: 5387

Three task parallel computation

I have a loop that goes through tens of millions of cycles, each cycle corresponding to a row of data file I'm reading. There are three sequential computations inside the loop. Loosely speaking, we can label them (a) read data, (b) process data, (c) accumulate results. (a), (b), and (c) take about the same time individually. (b) depends on (a), and (c) depends on (a) and (b). I think that if I make the program run in 3 threads, with each thread behind by one computation from it's neighbor, I can get about a factor of 3 speedup. Unfortunately, I'm not familiar with multithreading.

The way I see the design is like this:

  1. The first reads row n (a);
  2. When this is done, the first thread processes the row (b), and at the same time the second thread reads row n+1;
  3. When the second thread is done reading row n+1, it starts processing it, and the third thread reads row n+2. If the first thread is done with (b) it moves on the (c).

In other words, the sequence of steps is like this:

1a
1b 2a
1c 2b 3a
1a 2c 3b
1b 2a 3c
1c 2b 3a

and so on.

So, a single row always stays on the same thread. The next thread starts a new row when it's done with it's own and the other two threads have read the two preceding rows.

Can somebody help me set this up? These are the only constraints:

I also understand that each thread will have to have independent storage.

Forgot to mention: each row is processed entirely independently.

Upvotes: 1

Views: 137

Answers (1)

Gilles Schreuder
Gilles Schreuder

Reputation: 68

Assuming that the questiom may be rephrased as "how to improve program performance", and that the file is a sequential file residing on a hard disk:

Do not read record by record, but read many (say 1000000) records as a large chunk of data, and then retrieve the records from the buffer for processing.

A little test with a compiled C program on a system with i5-3220M CPU and 8GB RAM and SSD drive:

Reading a file of 14 million 80 byte records one by on took around 15 seconds, while reading the file in chunks of 1000000 records was just subsecond.

I would not be surprised if the relative improvement is larger for a mechanical disk.

I would also not be surprised if the benefits of a multithread processing approach would not justify its development costs.

Upvotes: 1

Related Questions