Reputation: 18594
I need to calculate the mean and extract the root of some numbers from a huge file:
1, 2, 3, 4, 5,\n
6, 7, 8, 9, 10,\n
11, 12, 13, 14,15,\n
...
This is the code:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
public class App1{
int res, c;
double mean, root;
ArrayList list = new ArrayList();
public App1() {
// einlesen
Scanner sc = null;
try {
sc = new Scanner(new File("file.txt")).useDelimiter("[,\\s]+");
} catch (FileNotFoundException ex) {
System.err.println(ex);
}
while (sc.hasNextInt()) {
list.add(sc.nextInt());
res += (int) list.get(c);
c++;
}
sc.close();
// Mean
mean = res / list.size();
// Root
root = Math.sqrt(mean);
System.out.println("Mean: " + mean);
System.out.println("Root: " + root);
}
public static void main(String[] args) {
App1 app = new App1();
}
}
Is there any way to parallelize it?
Before calculating the mean I need all the numbers, so one thread can't calculate while another is still fetching the numbers from the file.
The same thing with extracting the root: A thread can't extract it from the mean if the mean isn't calculated yet.
I thought about Future, would that be a solution?
Upvotes: 1
Views: 191
Reputation: 14269
No there is no way to parallelize this. Although you could do something that looks like you are using threading, the result will be overly complex but still run at about the same speed as before.
The reason for this is that file access is and has to be single-threaded, and beside reading from file all you do is two add operations. So in best case those add operations could be parallelized, however since those take almost no execution time, the gain would be like 5% - 10% at best. And that time is negated (or worse) by the thread creation and maintenance.
Once thing you can do to speed things up would be to remove the part where you put things into a list (assuming that you don't need those values later).
while (sc.hasNextInt()) {
res += sc.nextInt();
++c;
}
mean = res / c;
Upvotes: 0
Reputation: 65813
There's something critical you will have to accept up front - you will not be able to process the data any faster than you can read it from the file. So first time how long it will take to read through the whole file and accept that you won't improve on that.
That said - have you considered a ForkJoinPool.
Upvotes: 3
Reputation: 1496
I think it's possible.
It will need a bit of messing around to make sure the threads don't read too far into another threads block of the file, but should be do-able
Upvotes: 0
Reputation: 17707
You can calculate the mean in parallel, because the mean is simply the sum divided by the count. There is no reason why you cannot sum up the values in parallel, and count them as well, then just do the division later.
Consider a class:
public class PartialSum() {
private final int partialcount;
private final int partialsum;
public PartialSum(int count, int sum) {
partialsum = sum;
partialcount = count;
public int getCount() {
return partialcount;
}
public int getSum() {
return partialsum;
}
}
Now, this could be the return type of a Future, as in Future<PartialSum>
.
So, what you need to do is split the file in parts, and then send the parts to individual threads.
Each thread calculates a PartialSum
. Then, as the threads complete, you can:
int sum = 0;
int count = 0;
for(Future<PartialSum> partial : futures) {
PartialSum ps = partial.get();
sum += ps.getSum();
count += ps.getCount();
}
double mean = (double)sum / count;
double root = ....
Upvotes: 1