Evgenij Reznik
Evgenij Reznik

Reputation: 18594

Parallelize calculations

I need to calculate the mean and extract the root of some numbers from a huge file:

1, 2, 3, 4, 5,\n
6, 7, 8, 9, 10,\n
11, 12, 13, 14,15,\n
...

This is the code:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;

public class App1{

    int res, c;
    double mean, root;
    ArrayList list = new ArrayList();

    public App1() {
        // einlesen
        Scanner sc = null;
        try {
            sc = new Scanner(new File("file.txt")).useDelimiter("[,\\s]+");
        } catch (FileNotFoundException ex) {
            System.err.println(ex);
        }
        while (sc.hasNextInt()) {
            list.add(sc.nextInt());
            res += (int) list.get(c);
            c++;
        }
        sc.close();

        // Mean
        mean = res / list.size();

        // Root
        root = Math.sqrt(mean);

        System.out.println("Mean: " + mean);
        System.out.println("Root: " + root);
    }

    public static void main(String[] args) {
    App1 app = new App1();
    }
}

Is there any way to parallelize it?

Before calculating the mean I need all the numbers, so one thread can't calculate while another is still fetching the numbers from the file.
The same thing with extracting the root: A thread can't extract it from the mean if the mean isn't calculated yet.

I thought about Future, would that be a solution?

Upvotes: 1

Views: 191

Answers (4)

TwoThe
TwoThe

Reputation: 14269

No there is no way to parallelize this. Although you could do something that looks like you are using threading, the result will be overly complex but still run at about the same speed as before.

The reason for this is that file access is and has to be single-threaded, and beside reading from file all you do is two add operations. So in best case those add operations could be parallelized, however since those take almost no execution time, the gain would be like 5% - 10% at best. And that time is negated (or worse) by the thread creation and maintenance.

Once thing you can do to speed things up would be to remove the part where you put things into a list (assuming that you don't need those values later).

 while (sc.hasNextInt()) {
   res += sc.nextInt();
   ++c;
 }

 mean = res / c;

Upvotes: 0

OldCurmudgeon
OldCurmudgeon

Reputation: 65813

There's something critical you will have to accept up front - you will not be able to process the data any faster than you can read it from the file. So first time how long it will take to read through the whole file and accept that you won't improve on that.

That said - have you considered a ForkJoinPool.

Upvotes: 3

Frederik
Frederik

Reputation: 1496

I think it's possible.

  1. int offset = (filesize / Number of Threads)
  2. Create n threads
  3. Each thread starts reading from offset * thread number. Eg Thread 0 starts reading from byte 0, thread 1 starts reading from offset * 1, thread 2 starts reading from offset * 2
  4. If thread num != 0, read ahead until you hit a newline character - start from there.
  5. Add up an average per thread. Save to "thread_average" or something.
  6. When all threads are finished, total average = average of all "thread_average" variables
  7. Square root the total average variable.

It will need a bit of messing around to make sure the threads don't read too far into another threads block of the file, but should be do-able

Upvotes: 0

rolfl
rolfl

Reputation: 17707

You can calculate the mean in parallel, because the mean is simply the sum divided by the count. There is no reason why you cannot sum up the values in parallel, and count them as well, then just do the division later.

Consider a class:

public class PartialSum() {
    private final int partialcount;
    private final int partialsum;
    public PartialSum(int count, int sum) {
        partialsum = sum;
        partialcount = count;
    public int getCount() {
        return partialcount;
    }
    public int getSum() {
        return partialsum;
    }
}

Now, this could be the return type of a Future, as in Future<PartialSum>.

So, what you need to do is split the file in parts, and then send the parts to individual threads.

Each thread calculates a PartialSum. Then, as the threads complete, you can:

int sum = 0;
int count = 0;
for(Future<PartialSum> partial : futures) {
    PartialSum ps = partial.get();
    sum += ps.getSum();
    count += ps.getCount();
}

double mean = (double)sum / count;
double root = ....

Upvotes: 1

Related Questions