Reputation: 569
The problem is about getting some discontinuities in the execution time sequence for various input sizes. Specifically, I have been trying this code:
long double a[2000][2000];
int iter = 0;
int main(int argc, char const *argv[]){
istringstream is(argv[1]);
int N;
is >> N;
for(int i = 0; i <= N; ++i){
for (int J = 0; J <= N; ++J){
a[i][J] = (rand()%3+1)*(rand()%4+1);
}
}
clock_t clk= clock();
for(int k = 0; k < N; ++k){
for(int i = k+1; i < N; ++i){
a[i][k] = a[i][k]/a[k][k];
}
for(int i = k+1; i < N; ++i){
for(int j = k+1; j < N; ++j){
iter++;
a[i][j] = a[i][j] - a[i][k]*a[k][j];
}
}
}
clk = clock() - clk;
cout << "Time: " << ((double)clk)/CLOCKS_PER_SEC << "\n";
cout << iter << endl;
}
using g++ 5.4.1 for C++14 compilation.
I tried the code for various values of N. However something really weird happens around N = 500. Execution times are listed below. (These are the outputs of the code for various values of N.
N = 200 : 0.022136
N = 300 : 0.06792
N = 400 : 0.149622
N = 500 : 11.8341
N = 600 : 0.508186
N = 700 : 0.805481
N = 800 : 1.2062
N = 900 : 1.7092
N = 1000 : 2.35809
I tried for N = 500 a lot of times and also on another machine only to get similar results.
Around 500 we have the following:
N = 494 : 0.282626
N = 495 : 0.284564
N = 496 : 11.5308
N = 497 : 0.288031
N = 498 : 0.289903
N = 499 : 11.9615
N = 500 : 12.4032
N = 501 : 0.293737
N = 502 : 0.295729
N = 503 : 0.297859
N = 504 : 12.4154
N = 505 : 0.301002
N = 506 : 0.304718
N = 507 : 12.4385
Why is this happening?
Upvotes: 12
Views: 314
Reputation: 30010
Your program could have floating point overflows and operations which result in NaN for certain cases (if a calculation results in infinity/NaN, then it spreads for your algorithm, so almost all the numbers become infinity/NaN. It depends on rand()
's output. If you change the seed with srand()
, you may not get a slowdown for the N=500
case).
And, because you use long double
, the compiled program uses FPU (you can reproduce this with float
or double
as well, if you compile for FPU instead of SSE). It seems, that FPU handles infinite numbers much slower than normal numbers.
You can easily reproduce this issue with this snippet:
int main() {
volatile long double z = 2;
for (int i=0; i<10000000; i++) {
z *= z;
}
return z;
}
If you use 2 for z
, this program runs slowly (z
will overflow). If you replace it with 1, it becomes fast (z
won't overflow).
You can read more about this here: https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/
Here's the relevant part:
Performance implications on the x87 FPU
The performance of Intel’s x87 units on these NaNs and infinites is pretty bad. [...] Even today, on a SandyBridge processor, the x87 FPU causes a slowdown of about 370 to one on NaNs and infinities.
Upvotes: 2