shodanex
shodanex

Reputation: 15406

Why is my loop slower when I remove code

When I remove the tests to compute minimum and maximum from the loop, the execution time is actually longer than with the test. How is that possible ?

Edit : After running more test, it seems the runtime is not constant, ie the same code can run in 9 sec or 13 sec.... So it was just a repetable coincidence. Repetable until you do enough tests that is...

Some details :

Some guess : bad cache interaction ?

void    FillFullValues(void)
{
    int i,j,k;
    double  X,Y,Z;
    double  p,q,r,p1,q1,r1;
    double  Ls,as,bs;
    unsigned long t1, t2;

    t1 = GET_TICK_COUNT();  
    MinLs = Minas = Minbs = 1000000.0;
    MaxLs = Maxas = Maxbs = 0.0;

    for (i=0;i<256;i++)
    {
        for (j=0;j<256;j++)
        {
            for (k=0;k<256;k++)
            {
                X = 0.4124*CielabValues[i] + 0.3576*CielabValues[j] + 0.1805*CielabValues[k];
                Y = 0.2126*CielabValues[i] + 0.7152*CielabValues[j] + 0.0722*CielabValues[k];
                Z = 0.0193*CielabValues[i] + 0.1192*CielabValues[j] + 0.9505*CielabValues[k];

                p = X * InvXn;
                q = Y;
                r = Z * InvZn;

                if (q>0.008856)
                {
                    Ls = 116*pow(q,third)-16;
                }
                else
                {
                    Ls = 903.3*q;
                }

                if (q<=0.008856)
                {
                    q1 = 7.787*q+seiz;
                }
                else
                {
                    q1 = pow(q,third);
                }

                if (p<=0.008856)
                {
                    p1 = 7.787*p+seiz;
                }
                else
                {
                    p1 = pow(p,third);
                }

                if (r<=0.008856)
                {
                    r1 = 7.787*r+seiz;
                }
                else
                {
                    r1 = pow(r,third);
                }

                as = 500*(p1-q1);
                bs = 200*(q1-r1);

                //
                // cast on short int for reducing array size
                // 
                FullValuesLs[i][j][k] = (char) (Ls);
                FullValuesas[i][j][k] = (char) (as);
                FullValuesbs[i][j][k] = (char) (bs);

                            //// Remove this and get slower code    
                if (MaxLs<Ls)
                    MaxLs = Ls;
                if ((abs(Ls)<MinLs) && (abs(Ls)>0))
                    MinLs = Ls;

                if (Maxas<as)
                    Maxas = as;
                if ((abs(as)<Minas) && (abs(as)>0))
                    Minas = as;

                if (Maxbs<bs)
                    Maxbs = bs;
                if ((abs(bs)<Minbs) && (abs(bs)>0))
                    Minbs = bs;
                            //// End of Remove

            }
        }
    }

    TRACE(_T("LMax = %f LMin = %f\n"),(MaxLs),(MinLs));
    TRACE(_T("aMax = %f aMin = %f\n"),(Maxas),(Minas));
    TRACE(_T("bMax = %f bMin = %f\n"),(Maxbs),(Minbs));
    t2 = GET_TICK_COUNT();
    TRACE(_T("WhiteBalance init : %lu ms\n"), t2 - t1); 
}

Upvotes: 4

Views: 260

Answers (2)

flolo
flolo

Reputation: 15486

Maybe its the cache, maybe unrolling problems, there is only one way to answer this: look at the generated code (e.g. by using the -S option). Maybe you can post it/or spot the difference when comparing them.

EDIT: As you now clarified that it was just the measurement I can only recommend (or better command ;-) you, that when you want to get runtime numbers: ALWAYS put it into some loop and average it. Best to do it outside your programm (in a shell script), so your cache is not already filled with the right data.

Upvotes: 1

Elalfer
Elalfer

Reputation: 5338

I think compiler is trying to unroll the inner loop because you are removing dependency between iterations. But somehow this doesn't help in your case. Maybe because the loop is too big and using too many registers to be unrolled.

Try to turn off unrolling and post results again.

If this is the case, I would suggest you to submit a performance issue to gcc.

PS. I think you can merge if (q>0.008856) and if (q<=0.008856).

Upvotes: 2

Related Questions