Reputation: 462
I have a piece of code that must run extremely fast in terms of clock speed. The algorithm is already in O(N). It takes 2seconds, it needs to take 1s. For most A.length inputs ~ 100,000 it takes .3s unless a particular line of code is invoked an extreme number of times. (For an esoteric programming challenge)
It uses a calculation of the arithmetic series that 1,2,..N -> 1,3,4,10,15.. that can be represented by n*(n+1)/2 I loop through this equation hundreds of thousands of times. I do not have access to the input, nor can I display it. The only information I am able to get returned is the time it took to run. particularly the equation is:
s+=(n+c)-((n*(n+1))/2);
s and c can have values range from 0 to 1Billion
n can range 0 to 100,000
What is the most efficient way to write this statement in terms of clock speed? I have heard division takes more time then multiplication, but beyond that I could not determine whether writing this in one line or multiple assignment lines was more efficient. Dividing and multiplying versus multiplying and then dividing? Also would creating custom integers types significantly help?
Edit as per request, full code with small input case (sorry if it's ugly, I've just kept stripping it down):
public static void main(String[] args) {
int A[]={3,4,8,5,1,4,6,8,7,2,2,4};//output 44
int K=6;
//long start = System.currentTimeMillis();;
//for(int i=0;i<100000;i++){
System.out.println(mezmeriz4r(A,K));
//}
//long end = System.currentTimeMillis();;
// System.out.println((end - start) + " ms");
}
public static int mezmeriz4r(int[]A,int K){
int s=0;
int ml=s;
int mxl=s;
int sz=1;
int t=s;
int c=sz;
int lol=50000;
int end=A.length;
for(int i=sz;i<end;i++){
if(A[i]>A[mxl]){
mxl=i;
}else if(A[i]<A[ml]){
ml=i;
}
if(Math.abs(A[ml]-A[mxl])<=K){
sz++;
if(sz>=lol)return 1000000000;
if(sz>1){
c+=sz;
}
}else{
if(A[ml]!=A[i]){
t=i-ml;
s+=(t+c)-((t*(t+1))/(short)2);
i=ml;
ml++;
mxl=ml;
}else{
t=i-mxl;
s+=(t+c)-((t*(t+1))/(short)2);
i=mxl;
mxl++;
ml=mxl;
}
c=1;
sz=0;
}
}
if(s>1000000000)return 1000000000;
return s+c;
}
Returned from Challenge:
Detected time complexity:
O(N)
test time result
example example test 0.290 s. OK
single single element 0.290 s. OK
double two elements 0.290 s. OK
small_functional small functional tests 0.280 s. OK
small_random small random sequences length = ~100 0.300 s. OK
small_random2 small random sequences length = ~100 0.300 s. OK
medium_random chaotic medium sequences length = ~3,000 0.290 s. OK
large_range large range test, length = ~100,000 2.200 s. TIMEOUT ERROR running time: >2.20 sec., time limit: 1.02 sec.
large_random random large sequences length = ~100,000 0.310 s. OK
large_answer test with large answer 0.320 s. OK
large_extreme all maximal value = ~100,000 0.340 s. OK
Upvotes: 2
Views: 1667
Reputation: 10161
I would try the following and profile the code after each change to check if there is any gain in speed.
replace:
if(Math.abs(A[ml]-A[mxl])<=K)
by
int diff = A[ml]-A[mxl];
if(diff<=K && diff>=-K)
replace
/2
by
>>1
replace
ml++;
mxl=ml;
by
mxl=++ml;
Maybe avoid array access of the same element (internal boundary checks of java may take some time)
So staore at least A[i]
in a local varianble.
Upvotes: 1
Reputation: 3364
I dont have access to validate all inputs. and time range. but this one runs O(N) for sure. and have improved. run and let me know your feedback.i will provide details if necessary
public static int solution(int[]A,int K){
int minIndex=0;
int maxIndex=0;
int end=A.length;
int slize = end;
int startIndex = 0;
int diff = 0;
int minMaxIndexDiff = 0;
for(int currIndex=1;currIndex<end;currIndex++){
if(A[currIndex]>A[maxIndex]){
maxIndex=currIndex;
}else if(A[currIndex]<A[minIndex]){
minIndex=currIndex;
}
if( (A[maxIndex]-A[minIndex]) >K){
minMaxIndexDiff= currIndex- startIndex;
if (minMaxIndexDiff > 1){
slize+= ((minMaxIndexDiff*(minMaxIndexDiff-1)) >> 1);
if (diff > 0 ) {
slize = slize + (diff * minMaxIndexDiff);
}
}
if (minIndex == currIndex){
diff = currIndex - (maxIndex + 1);
}else{
diff = currIndex - (minIndex + 1);
}
if (slize > 1000000000) {
return 1000000000;
}
minIndex = currIndex;
maxIndex = currIndex;
startIndex = currIndex;
}
}
if ( (startIndex +1) == end){
return slize;
}
if (slize > 1000000000) {
return 1000000000;
}
minMaxIndexDiff= end- startIndex;
if (minMaxIndexDiff > 1){
slize+= ((minMaxIndexDiff*(minMaxIndexDiff-1)) >> 1);
if (diff > 0 ) {
slize = slize + (diff * minMaxIndexDiff);
}
}
return slize;
}
Upvotes: 1
Reputation: 15313
Nested assignments, i. e. instead of
t=i-ml;
s+=(t+c)-((t*(t+1))/(short)2);
i=ml;
ml++;
mxl=ml;
something like
s+=((t=i-ml)+c);
s-=((t*(t+1))/(short)2);
i=ml;
mxl=++ml;
sometimes occurs in OpenJDK sources. It mainly results in replacing *load
bytecode instructions with *dup
s. According to my experiments, it really gives a very little speedup, but it is ultra hadrcore, I don't recommend to write such code manually.
Upvotes: 1
Reputation: 28757
I would try to elimnate this line if(Math.abs(A[ml]-A[mxl])<=
by a faster self calculated abs version, which is inlined, not a method call!
The cast to (short) does not help, but try the right shift operator X >>1 instead x / 2
removing the System.out.println() can speed up by factor of 1000. But be carefull otherwise your whole algorithm can be removed by the VM becasue you dont use it. Old code:
for(int i=0;i<100000;i++){
System.out.println(mezmeriz4r(A,K));
}
New code:
int dummy = 0;
for(int i=0;i<100000;i++){
dummy = mezmeriz4r(A,K);
}
//Use dummy otherwise optimisation can remove mezmeriz4r
System.out.print("finished: " + dummy);
Upvotes: 0
Reputation: 8128
I would create a C version first and see, how fast it can go with "direct access to the metal". Chances are, you are trying to optimize calculation which is already optimized to the limit.
Upvotes: 0
Reputation: 4436
Get rid of the System.out.println() in the for loop :) you will be amazed how much faster your calculation will be
Upvotes: 1
Reputation: 178303
With a little algebra, you can simply the expression (n+c)-((n*(n+1))/2)
to c-((n*(n-1))/2)
to remove an addition operation. Then you can replace the division by 2
with a bit-shift to the right by 1
, which is faster than division. Try replacing
s+=(n+c)-((n*(n+1))/2);
with
s+=c-((n*(n-1))>>1);
Upvotes: 3