firewithin
firewithin

Reputation: 569

Weird Execution Times

The problem is about getting some discontinuities in the execution time sequence for various input sizes. Specifically, I have been trying this code:

long double a[2000][2000];
int iter = 0;
int main(int argc, char const *argv[]){
    istringstream is(argv[1]);
    int N;
    is >> N;
    for(int i = 0; i <= N; ++i){
        for (int J = 0; J <= N; ++J){
            a[i][J]  = (rand()%3+1)*(rand()%4+1);
        }
    }
    clock_t clk= clock();
    for(int k = 0; k < N; ++k){
        for(int i = k+1; i < N; ++i){
            a[i][k] = a[i][k]/a[k][k];
        }
        for(int i = k+1; i < N; ++i){
            for(int j = k+1; j < N; ++j){
                iter++;
                a[i][j] = a[i][j] - a[i][k]*a[k][j];
            }
        }
    }
    clk = clock() - clk;
    cout << "Time: " << ((double)clk)/CLOCKS_PER_SEC << "\n";
    cout << iter << endl;
}

using g++ 5.4.1 for C++14 compilation.

I tried the code for various values of N. However something really weird happens around N = 500. Execution times are listed below. (These are the outputs of the code for various values of N.

N = 200 : 0.022136
N = 300 : 0.06792
N = 400 : 0.149622
N = 500 : 11.8341
N = 600 : 0.508186
N = 700 : 0.805481
N = 800 : 1.2062
N = 900 : 1.7092
N = 1000 : 2.35809

I tried for N = 500 a lot of times and also on another machine only to get similar results.

Around 500 we have the following:

N = 494 : 0.282626
N = 495 : 0.284564
N = 496 : 11.5308
N = 497 : 0.288031
N = 498 : 0.289903
N = 499 : 11.9615
N = 500 : 12.4032
N = 501 : 0.293737
N = 502 : 0.295729
N = 503 : 0.297859
N = 504 : 12.4154
N = 505 : 0.301002
N = 506 : 0.304718
N = 507 : 12.4385

Why is this happening?

Upvotes: 12

Views: 314

Answers (1)

geza
geza

Reputation: 30010

Your program could have floating point overflows and operations which result in NaN for certain cases (if a calculation results in infinity/NaN, then it spreads for your algorithm, so almost all the numbers become infinity/NaN. It depends on rand()'s output. If you change the seed with srand(), you may not get a slowdown for the N=500 case).

And, because you use long double, the compiled program uses FPU (you can reproduce this with float or double as well, if you compile for FPU instead of SSE). It seems, that FPU handles infinite numbers much slower than normal numbers.

You can easily reproduce this issue with this snippet:

int main() {
    volatile long double z = 2;

    for (int i=0; i<10000000; i++) {
        z *= z;
    }

    return z;
}

If you use 2 for z, this program runs slowly (z will overflow). If you replace it with 1, it becomes fast (z won't overflow).

You can read more about this here: https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/

Here's the relevant part:

Performance implications on the x87 FPU

The performance of Intel’s x87 units on these NaNs and infinites is pretty bad. [...] Even today, on a SandyBridge processor, the x87 FPU causes a slowdown of about 370 to one on NaNs and infinities.

Upvotes: 2

Related Questions