Reputation: 15406
When I remove the tests to compute minimum and maximum from the loop, the execution time is actually longer than with the test. How is that possible ?
Edit : After running more test, it seems the runtime is not constant, ie the same code can run in 9 sec or 13 sec.... So it was just a repetable coincidence. Repetable until you do enough tests that is...
Some details :
CFLAGS=-Wall -O2 -fPIC -g
Some guess : bad cache interaction ?
void FillFullValues(void)
{
int i,j,k;
double X,Y,Z;
double p,q,r,p1,q1,r1;
double Ls,as,bs;
unsigned long t1, t2;
t1 = GET_TICK_COUNT();
MinLs = Minas = Minbs = 1000000.0;
MaxLs = Maxas = Maxbs = 0.0;
for (i=0;i<256;i++)
{
for (j=0;j<256;j++)
{
for (k=0;k<256;k++)
{
X = 0.4124*CielabValues[i] + 0.3576*CielabValues[j] + 0.1805*CielabValues[k];
Y = 0.2126*CielabValues[i] + 0.7152*CielabValues[j] + 0.0722*CielabValues[k];
Z = 0.0193*CielabValues[i] + 0.1192*CielabValues[j] + 0.9505*CielabValues[k];
p = X * InvXn;
q = Y;
r = Z * InvZn;
if (q>0.008856)
{
Ls = 116*pow(q,third)-16;
}
else
{
Ls = 903.3*q;
}
if (q<=0.008856)
{
q1 = 7.787*q+seiz;
}
else
{
q1 = pow(q,third);
}
if (p<=0.008856)
{
p1 = 7.787*p+seiz;
}
else
{
p1 = pow(p,third);
}
if (r<=0.008856)
{
r1 = 7.787*r+seiz;
}
else
{
r1 = pow(r,third);
}
as = 500*(p1-q1);
bs = 200*(q1-r1);
//
// cast on short int for reducing array size
//
FullValuesLs[i][j][k] = (char) (Ls);
FullValuesas[i][j][k] = (char) (as);
FullValuesbs[i][j][k] = (char) (bs);
//// Remove this and get slower code
if (MaxLs<Ls)
MaxLs = Ls;
if ((abs(Ls)<MinLs) && (abs(Ls)>0))
MinLs = Ls;
if (Maxas<as)
Maxas = as;
if ((abs(as)<Minas) && (abs(as)>0))
Minas = as;
if (Maxbs<bs)
Maxbs = bs;
if ((abs(bs)<Minbs) && (abs(bs)>0))
Minbs = bs;
//// End of Remove
}
}
}
TRACE(_T("LMax = %f LMin = %f\n"),(MaxLs),(MinLs));
TRACE(_T("aMax = %f aMin = %f\n"),(Maxas),(Minas));
TRACE(_T("bMax = %f bMin = %f\n"),(Maxbs),(Minbs));
t2 = GET_TICK_COUNT();
TRACE(_T("WhiteBalance init : %lu ms\n"), t2 - t1);
}
Upvotes: 4
Views: 260
Reputation: 15486
Maybe its the cache, maybe unrolling problems, there is only one way to answer this: look at the generated code (e.g. by using the -S
option). Maybe you can post it/or spot the difference when comparing them.
EDIT: As you now clarified that it was just the measurement I can only recommend (or better command ;-) you, that when you want to get runtime numbers: ALWAYS put it into some loop and average it. Best to do it outside your programm (in a shell script), so your cache is not already filled with the right data.
Upvotes: 1
Reputation: 5338
I think compiler is trying to unroll the inner loop because you are removing dependency between iterations. But somehow this doesn't help in your case. Maybe because the loop is too big and using too many registers to be unrolled.
Try to turn off unrolling and post results again.
If this is the case, I would suggest you to submit a performance issue to gcc.
PS. I think you can merge if (q>0.008856)
and if (q<=0.008856)
.
Upvotes: 2