Reputation: 11
I would like to know how to count execution time of c++ inline assembler? My problem is that the difference of Tickcounts is 0.
Here is my cpp code:
const int N = 100000;
short x[4*N];
short a[4*N];
for (int j=0;j<4*N;j++) {
x[j] = rand() % 1000;
a[j] = rand() % 5000;
}
DWORD dwAStart = GetTickCount();
__asm {
xor eax,eax
mov ecx,N
xor esi,esi
a1:
emms
movq mm1,qword ptr x[esi]
movq mm2,mm1
punpcklwd mm1,mm6
punpckhwd mm2,mm6
movq mm0,qword ptr a[esi]
movq mm3,mm0
punpcklwd mm0,mm6
punpckhwd mm3,mm6
pmullw mm0,mm1
paddsw mm0,mm3
add esi , 8
loop a1
};
DWORD dwAInterval = GetTickCount() - dwAStart;
printf("Operation is completed through %d ms (Assembler)!\n", (int)dwAInterval);
Upvotes: 1
Views: 600
Reputation: 11
I used:
#include <iostream>
using namespace std;
typedef std::chrono::high_resolution_clock Clock;
int main()
{
int X4,sum,avg;
auto t1 = Clock::now();
auto t2 = Clock::now();
sum=avg=0;
for( int i=0; i<TRIALS; i++ )
{
X4=17;
t1 = Clock::now();
asm (
"movl %0, %%eax;" // X->ax
"movl $0x0A, %%ebx;" // 10->bx
"mul %%ebx;" // 10*ax->ax
: "=a" (X4)
: "a" (X4)
: "%ebx"
);
t2 = Clock::now();
sum+=chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
}
avg=sum/TRIALS;
cout << "| Product: " << X4<< " "<< avg << " nanoseconds |" << endl;
}
Upvotes: 0
Reputation: 6846
As GregS points out, GetTickCount is far too coarse to use for timing short sequences of code. And the Time Stamp Counter found on x86 processors has limitations that make it very unreliable on multi-core processors. The most reliable solution is the QueryPerformanceCounter and QueryPerformanceFrequency functions. On *nix platforms, the POSIX function clock_gettime() serves a similar purpose.
Upvotes: 2
Reputation: 41967
ticks, as counted by GetTickCount()
, are too coarse to capture time differences from such short sequences of assembly code. You will have to use the x86 Time Stamp Counter to see the time; the instruction mnemonic is usually RDTSC
in assembly. All caveats apply, such as: your process may get interrupted (this will invalidate the counts), the clock frequency may actually change, activity in other cores may affect the timing of your core, ....
Upvotes: 2