user1903865
user1903865

Reputation: 11

Inline assembler execution time

I would like to know how to count execution time of c++ inline assembler? My problem is that the difference of Tickcounts is 0.

Here is my cpp code:

const int N = 100000;
short x[4*N];

short a[4*N];
    for (int j=0;j<4*N;j++) {
        x[j] = rand() % 1000;
        a[j] = rand() % 5000;
    }
DWORD dwAStart = GetTickCount();

__asm {

    xor eax,eax

        mov ecx,N
    xor esi,esi

a1:

        emms
    movq        mm1,qword ptr x[esi]
    movq        mm2,mm1
    punpcklwd   mm1,mm6
    punpckhwd   mm2,mm6
    movq        mm0,qword ptr a[esi]
    movq        mm3,mm0
    punpcklwd   mm0,mm6
    punpckhwd   mm3,mm6
    pmullw      mm0,mm1

    paddsw      mm0,mm3

    add esi , 8
    loop a1
};
DWORD dwAInterval = GetTickCount() - dwAStart;
printf("Operation is completed through %d ms (Assembler)!\n", (int)dwAInterval); 

Upvotes: 1

Views: 600

Answers (3)

Michael Pellegrino
Michael Pellegrino

Reputation: 11

I used:

#include <iostream>
using namespace std;
typedef std::chrono::high_resolution_clock Clock;
int main()
{
  int X4,sum,avg;
  auto t1 = Clock::now();
  auto t2 = Clock::now();
  sum=avg=0;
  for( int i=0; i<TRIALS; i++ )
    {
      X4=17;
      t1 = Clock::now();
      asm  (
	    "movl %0, %%eax;" // X->ax
	    "movl $0x0A, %%ebx;" // 10->bx
	    "mul %%ebx;" // 10*ax->ax
	    : "=a" (X4)
	    : "a" (X4)
	    : "%ebx"
	    );
      t2 = Clock::now();
      sum+=chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
    }
  avg=sum/TRIALS;
  cout << "| Product:  " << X4<< "  "<< avg << " nanoseconds |" << endl;
}

Upvotes: 0

Carey Gregory
Carey Gregory

Reputation: 6846

As GregS points out, GetTickCount is far too coarse to use for timing short sequences of code. And the Time Stamp Counter found on x86 processors has limitations that make it very unreliable on multi-core processors. The most reliable solution is the QueryPerformanceCounter and QueryPerformanceFrequency functions. On *nix platforms, the POSIX function clock_gettime() serves a similar purpose.

Upvotes: 2

President James K. Polk
President James K. Polk

Reputation: 41967

ticks, as counted by GetTickCount(), are too coarse to capture time differences from such short sequences of assembly code. You will have to use the x86 Time Stamp Counter to see the time; the instruction mnemonic is usually RDTSC in assembly. All caveats apply, such as: your process may get interrupted (this will invalidate the counts), the clock frequency may actually change, activity in other cores may affect the timing of your core, ....

Upvotes: 2

Related Questions