Reputation: 5800
I wrote a little program to calculate the first 18 triples (x,y,z)
with x<y<z
, which satisfy x^3+y^3=z^3+1
.
While playing around to optimise the total runtime, I discovered, that using double
for the cubic values and the two sides of the equation is faster than using long
. On my machine the difference is about 3 seconds.
Now I wonder why exactly this is the case. I guess it is somewhere in the internal handling of long
while the comparison of two long
-Variables, as this is the only thing, which changes within the calculation loops.
Here is my code:
class Threes {
public static void main(String[] args) {
System.out.println("Threes --- Java");
int Z_MAX = 60000, Y_MAX = Z_MAX-1, X_MAX = Y_MAX-1;
double[] powers = new double[Z_MAX+1];
for (int i = 0; i <= Z_MAX; i++) {
powers[i] = Math.pow(i, 3);
}
System.out.println("Powers calculated");
int x, y, z;
double right, left;
int[][] sets = new int[18][3];
int foundCount = 0;
long loopCount = 0;
long start, end;
start = System.currentTimeMillis();
for (x = 1 ; x < X_MAX; x++) {
for (y = x + 1; y < Y_MAX; y++) {
right = powers[x] + powers[y];
for (z = y + 1; z < Z_MAX; z++) {
left = powers[z] + 1;
if (right < left) {
z = Z_MAX;
} else if (right == left) {
sets[foundCount][0] = x;
sets[foundCount][1] = y;
sets[foundCount][2] = z;
foundCount++;
end = System.currentTimeMillis();
System.out.println("found " + foundCount + ". set:\t" + x + "\t" + y + "\t" + z + "\t" + ((end - start) / 1000.0));
if (foundCount == 18) {
x = X_MAX;
y = Y_MAX;
z = Z_MAX;
}
}
loopCount++;
}
}
}
System.out.println("finished: " + loopCount);
}
}
The lines I changed are:
double[] powers = new double[Z_MAX+1];
becomes
long[] powers = new long[Z_MAX+1];
and
powers[i] = Math.pow(i, 3);
becomes
powers[i] = (long)Math.pow(i, 3);
and
double right, left;
becomes
long right, left;
"Bonus Question": What other possibilities of optimizing the whole code in terms of total runtime do I have? I know, that leaving out the loopCount
gives me some milliseconds. I'm sure, that I have to reduce the number of loop iterations significantly. But how?
Upvotes: 5
Views: 980
Reputation: 7731
Your biggest "bonus" optimization will be to replace the z
loop with a calculation like:
z = Math.round(Math.pow(left - 1, 1./3));
and check if z > y && left == powers[(int)z] + 1
.
Other improvements if you wanted to find all triples within your limits:
x
at 2 instead of 1z = Z_MAX;
with break;
to exit the loop early X_MAX
as Math.pow((powers[Z_MAX] + 1)/2, 1./3)
~= Z_MAX * Math.pow(0.5, 1./3)
since if x
is bigger than that, z
will exceed Z_MAX
Y_MAX
for each x
as Math.pow(powers[Z_MAX] - powers[x] + 1, 1./3)/2
BTW, a more common way to order the triples would be using z as the primary sort key, which may result in a different first 18 than you get ordering by x first. To change that, you'd make your outer loop iterate over z, which would be simpler anyway:
for (z = 1; z < Z_MAX; z++) {
for (y = 1; y < z - 1; y++) {
zy = powers[z] - 1 - powers[y];
x = Math.round(Math.pow(zy, 1./3));
if (x < y && zy == powers[(int)x])
...report triple found;
}
}
Upvotes: 1
Reputation: 272667
It's probably fair to say that your code spends most of its time in this section:
for (z = y + 1; z < Z_MAX; z++) {
left = powers[z] + 1;
if (right < left) {
z = Z_MAX;
}
And most of the time, it will always be taking the same branch out of the conditional. So once your code has reached the steady-state (i.e. once the CPU's branch predictor is set up), the run-time will be dominated by the computation itself: dependencies are minimised, so the latency of the instruction pipeline doesn't matter.
On a 32-bit machine, doing addition and comparison on 64-bit integer types takes more instructions than doing the equivalent on double
s. A double
calculation will take more cycles to complete, but that doesn't matter. We're dominated by instruction throughput, not latency. So the overall run-time will be longer.
In terms of further optimization, you could move the +1 outside the inner loop, by calculating right = powers[x] + powers[y] - 1
. But it's possible the optimizer has already spotted that.
Upvotes: 3
Reputation: 953
If you are using 32-bit operating system the performance for long-variable could be worse because long is 64-bit type. For example, with 64-bit OS Java could do the comparison with just one machine instruction, but in 32-bit environment it has to use multiple machine instructions, since it can only handle 32-bit at the time.
But for double, this is not neccessary the case, since 32-bit systems have machine instructions for 64-bit floating point numbers, even when thet don't have them for 64-bit integers.
Also, with code:
powers[i] = (long)Math.pow(i, 3);
there is two unneccesary conversions, first i (integer) is converted to double (that's what Math.pow takes) and then the return value is converted back to 64-bit integer (long).
Upvotes: 9