user110219
user110219

Reputation: 163

How to count clock cycles at -O3 using Code::Blocks 16.01 in Windows 10?

In my C program I am counting the number of clock cycles on a 64-bit Intel Corei5-2410M Sandy Bridge machine running Windows 10 Home but there is something strange. I compile the program both at -O2 and -O3 using Code::Blocks (CB) 16.01 in release build. For -O2 the clock cycles are okay but -O3 returns 0 cycles. For now, I'm not taking turbo-boost and hyper-threading into consideration but definitely will disable them later.

I use the following commands for compiling

mingw32-gcc.exe -Wall -O2 -m32 -IC:\GMP\include -c "E:\abc\main.c" -o obj\Release\main.o
mingw32-gcc.exe -Wall -O3 -m32 -IC:\GMP\include -c "E:\abc\main.c" -o obj\Release\main.o

We have

void schoolbook_9(int32_t *X, int32_t *Y, int64_t *Z){
Z[0] = (int64_t)X[0]*Y[0]  + (int64_t)X[1]*Y[1]  + (int64_t)X[2]*Y[2]  + (int64_t)X[3]*Y[3] + (int64_t)X[4]*Y[4] + (int64_t)X[5]*Y[5] + (int64_t)X[6]*Y[6] + (int64_t)X[7]*Y[7] + (int64_t)X[8]*Y[8];
Z[1] = (int64_t)X[9]*Y[0]  + (int64_t)X[0]*Y[1]  + (int64_t)X[1]*Y[2]  + (int64_t)X[2]*Y[3] + (int64_t)X[3]*Y[4] + (int64_t)X[4]*Y[5] + (int64_t)X[5]*Y[6] + (int64_t)X[6]*Y[7] + (int64_t)X[7]*Y[8];
Z[2] = (int64_t)X[10]*Y[0] + (int64_t)X[9]*Y[1]  + (int64_t)X[0]*Y[2]  + (int64_t)X[1]*Y[3] + (int64_t)X[2]*Y[4] + (int64_t)X[3]*Y[5] + (int64_t)X[4]*Y[6] + (int64_t)X[5]*Y[7] + (int64_t)X[6]*Y[8];
Z[3] = (int64_t)X[11]*Y[0] + (int64_t)X[10]*Y[1] + (int64_t)X[9]*Y[2]  + (int64_t)X[0]*Y[3] + (int64_t)X[1]*Y[4] + (int64_t)X[2]*Y[5] + (int64_t)X[3]*Y[6] + (int64_t)X[4]*Y[7] + (int64_t)X[5]*Y[8];
Z[4] = (int64_t)X[12]*Y[0] + (int64_t)X[11]*Y[1] + (int64_t)X[10]*Y[2] + (int64_t)X[9]*Y[3] + (int64_t)X[0]*Y[4] + (int64_t)X[1]*Y[5] + (int64_t)X[2]*Y[6] + (int64_t)X[3]*Y[7] + (int64_t)X[4]*Y[8];
Z[5] = (int64_t)X[13]*Y[0] + (int64_t)X[12]*Y[1] + (int64_t)X[11]*Y[2] + (int64_t)X[10]*Y[3] + (int64_t)X[9]*Y[4] + (int64_t)X[0]*Y[5] + (int64_t)X[1]*Y[6] + (int64_t)X[2]*Y[7] + (int64_t)X[3]*Y[8];
Z[6] = (int64_t)X[14]*Y[0] + (int64_t)X[13]*Y[1] + (int64_t)X[12]*Y[2] + (int64_t)X[11]*Y[3] + (int64_t)X[10]*Y[4] + (int64_t)X[9]*Y[5] + (int64_t)X[0]*Y[6] + (int64_t)X[1]*Y[7] + (int64_t)X[2]*Y[8];
Z[7] = (int64_t)X[15]*Y[0] + (int64_t)X[14]*Y[1] + (int64_t)X[13]*Y[2] + (int64_t)X[12]*Y[3] + (int64_t)X[11]*Y[4] + (int64_t)X[10]*Y[5] + (int64_t)X[9]*Y[6] + (int64_t)X[0]*Y[7] + (int64_t)X[1]*Y[8];
Z[8] = (int64_t)X[16]*Y[0] + (int64_t)X[15]*Y[1] + (int64_t)X[14]*Y[2] + (int64_t)X[13]*Y[3] + (int64_t)X[12]*Y[4] + (int64_t)X[11]*Y[5] + (int64_t)X[10]*Y[6] + (int64_t)X[9]*Y[7] + (int64_t)X[0]*Y[8];}

I'm counting the clock cycles as follows

int32_t X[17], Y[9];
int64_t Z[9];
utype64 start, end;
uint32_t i;

srand(time(NULL));
for(i=0; i<17; i++)
    X[i] = rand()%(uint32_t)pow(2.0, 29);
srand(time(NULL));
for(i=0; i<9; i++)
    Y[i] = rand()%(uint32_t)pow(2.0, 29);

start=rdtsc();
end=rdtscp();
start=rdtsc();
for(i=0; i<10000000; i++)
    schoolbook_9(X, Y, Z);

end=rdtscp();
printf("\n%s%"PRIu64"\n", "The cycles count using SB of size 9 is :: ", (end-start)/10000000);

I'm using the rdtscp instruction because my system supports it and may be it is not available on 32-bit machine therefore, I have tested my program both with/out rdtscp. The arguments X, Y, and Z are arrays where X and Y are 32-bit and Z is 64-bit.

So, my questions is how to get the cycles count at -O3? Because for the current code I get 0 cycle.

The flage -ftree-loop-vectorize is set at -O3 as described on this page https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html. Does it mean that loop has been vectorized? If yes, then how can one determine what is the length (4 elements, 6 elements etc.) of the vector?

Upvotes: 1

Views: 191

Answers (1)

Stargateur
Stargateur

Reputation: 26757

It's because end - start is lower than 10000000 with -O3. Your division produce 0.

utype64 result = end - start;
utype64 cycle = 10000000;
utype64 total = result / cycle;
utype64 rest = result % cycle;
printf("The cycles count using SB of size 9 is " PRIu64
       " and the rest is " PRIu64 "\n",
       total, rest);

And you should not call twice srand(time(NULL));. It is useless and can produce strange behavior.

Note: I can't test myself.

Upvotes: 1

Related Questions