user977648
user977648

Reputation: 171

How come my array index is faster than pointer

Why the array index is faster than pointer? Isn't pointer supposed to be faster than array index?

** i used time.h clock_t to tested two functions, each loop 2 million times.

Pointer time : 0.018995

Index time : 0.017864

void myPointer(int a[], int size)
{
     int *p;
     for(p = a; p < &a[size]; p++)
     {
         *p = 0;
     }
}


void myIndex(int a[], int size)
{
     int i;
     for(i = 0; i < size; i++)
     {
         a[i] = 0;
     }
}

Upvotes: 17

Views: 10499

Answers (10)

Compiler optimizations are pattern matching.

When your compiler optimizes, it looks for known code patterns, and then transforms the code according to some rule. Your two code snippets seem to trigger different transforms, and thus produce slightly different code.

This is one of the reasons why we always insist to actually measure the resulting performance when it comes to optimizations: You can never be sure what your compiler turns your code into unless you test it.


If you are really curious, try compiling your code with gcc -S -Os, this produces the most readable, yet optimized assembler code. On your two functions, I get the following assembler with that:

pointer code:
.L2:
    cmpq    %rax, %rdi
    jnb .L5
    movl    $0, (%rdi)
    addq    $4, %rdi
    jmp .L2
.L5:

index code:
.L7:
    cmpl    %eax, %esi
    jle .L9
    movl    $0, (%rdi,%rax,4)
    incq    %rax
    jmp .L7
.L9:

The differences are slight, but may indeed trigger a performance difference, most importantly the difference between using addq and incq could be significant.

Upvotes: 1

Tiger4Hire
Tiger4Hire

Reputation: 1091

This is a very hard thing to time, because compilers are very good at optimising these things. Still it's better to give the compiler as much information as possible, that's why in this case I'd advise using std::fill, and let the compiler choose.

But... If you want to get into the detail

a) CPU's normally give pointer+value for free, like : mov r1, r2(r3).
b) This means an index operation requires just : mul r3,r1,size
This is just one cycle extra, per loop.
c) CPU's often provide stall/delay slots, meaning you can often hide single-cycle operations.

All in all, even if your loops are very large, the cost of the access is nothing compared to the cost of even a few cache-misses. You are best advised to optimise your structures before you care about loop costs. Try for example, packing your structures to reduce the memory footprint first

Upvotes: 0

Varun Chhangani
Varun Chhangani

Reputation: 1206

Access the data through array index or pointer is exactly equivalent. Go through the below program with me...

There are a loop which continues to 100 times but when we see disassemble code that there are the data which we access through has least instruction comparability to access through array Index

But it doesn't mean that accessing data through pointer is fast actually it's depend on the instruction which performed by compiler.Both the pointer and array index used the address array access the value from offset and increment through it and pointer has address.

int a[100];
fun1(a,100);
fun2(&a[0],5);
}
void fun1(int a[],int n)
{
int i;
for(i=0;i<=99;i++)
{
a[i]=0;
printf("%d\n",a[i]);
}
}
void fun2(int *p,int n)
{
int i;
for(i=0;i<=99;i++)
{
*p=0;
printf("%d\n",*(p+i));
}
}


disass fun1
Dump of assembler code for function fun1:
   0x0804841a <+0>: push   %ebp
   0x0804841b <+1>: mov    %esp,%ebp
   0x0804841d <+3>: sub    $0x28,%esp`enter code here`
   0x08048420 <+6>: movl   $0x0,-0xc(%ebp)
   0x08048427 <+13>:    jmp    0x8048458 <fun1+62>
   0x08048429 <+15>:    mov    -0xc(%ebp),%eax
   0x0804842c <+18>:    shl    $0x2,%eax
   0x0804842f <+21>:    add    0x8(%ebp),%eax
   0x08048432 <+24>:    movl   $0x0,(%eax)
   0x08048438 <+30>:    mov    -0xc(%ebp),%eax
   0x0804843b <+33>:    shl    $0x2,%eax
   0x0804843e <+36>:    add    0x8(%ebp),%eax
   0x08048441 <+39>:    mov    (%eax),%edx
   0x08048443 <+41>:    mov    $0x8048570,%eax
   0x08048448 <+46>:    mov    %edx,0x4(%esp)
   0x0804844c <+50>:    mov    %eax,(%esp)
   0x0804844f <+53>:    call   0x8048300 <printf@plt>
   0x08048454 <+58>:    addl   $0x1,-0xc(%ebp)
   0x08048458 <+62>:    cmpl   $0x63,-0xc(%ebp)
   0x0804845c <+66>:    jle    0x8048429 <fun1+15>
   0x0804845e <+68>:    leave  
   0x0804845f <+69>:    ret    
End of assembler dump.
(gdb) disass fun2
Dump of assembler code for function fun2:
   0x08048460 <+0>: push   %ebp
   0x08048461 <+1>: mov    %esp,%ebp
   0x08048463 <+3>: sub    $0x28,%esp
   0x08048466 <+6>: movl   $0x0,-0xc(%ebp)
   0x0804846d <+13>:    jmp    0x8048498 <fun2+56>
   0x0804846f <+15>:    mov    0x8(%ebp),%eax
   0x08048472 <+18>:    movl   $0x0,(%eax)
   0x08048478 <+24>:    mov    -0xc(%ebp),%eax
   0x0804847b <+27>:    shl    $0x2,%eax
   0x0804847e <+30>:    add    0x8(%ebp),%eax
   0x08048481 <+33>:    mov    (%eax),%edx
   0x08048483 <+35>:    mov    $0x8048570,%eax
   0x08048488 <+40>:    mov    %edx,0x4(%esp)
   0x0804848c <+44>:    mov    %eax,(%esp)
   0x0804848f <+47>:    call   0x8048300 <printf@plt>
   0x08048494 <+52>:    addl   $0x1,-0xc(%ebp)
   0x08048498 <+56>:    cmpl   $0x63,-0xc(%ebp)
   0x0804849c <+60>:    jle    0x804846f <fun2+15>
   0x0804849e <+62>:    leave  
   0x0804849f <+63>:    ret    
End of assembler dump.
(gdb) 

Upvotes: 0

Agnius Vasiliauskas
Agnius Vasiliauskas

Reputation: 11267

Oops, on my 64-bit system results are quite different. I've got that this

 int i;

 for(i = 0; i < size; i++)
 {
     *(a+i) = 0;
 }

is about 100 times !! slower than this

 int i;
 int * p = a;

 for(i = 0; i < size; i++)
 {
     *(p++) = 0;
 }

when compiling with -O3. This hints to me that somehow moving to next address is far easier to achieve for 64-bit cpu, than to calculate destination address from some offset. But i'm not sure.

EDIT:
This really has something related with 64-bit architecture because same code with same compile flags doesn't shows any real difference in performance on 32-bit system.

Upvotes: 1

shenles
shenles

Reputation: 582

It looks like the index solution can save a few instructions with the compare in the for loop.

Upvotes: 0

minjang
minjang

Reputation: 9050

No, never ever pointers are supposed to be faster than array index. If one of the code is faster than the other, it's mostly because some address computations might be different. The question also should provide information of compiler and optimization flags as it can heavily affect the performance.

Array index in your context (array bound is not known) is exactly identical to the pointer operation. From a viewpoint of compilers, it is just different expression of pointer arithmetic. Here is an example of an optimized x86 code in Visual Studio 2010 with full optimization and no inline.

     3: void myPointer(int a[], int size)
     4: {
013E1800  push        edi  
013E1801  mov         edi,ecx  
     5:      int *p;
     6:      for(p = a; p < &a[size]; p++)
013E1803  lea         ecx,[edi+eax*4]  
013E1806  cmp         edi,ecx  
013E1808  jae         myPointer+15h (13E1815h)  
013E180A  sub         ecx,edi  
013E180C  dec         ecx  
013E180D  shr         ecx,2  
013E1810  inc         ecx  
013E1811  xor         eax,eax  
013E1813  rep stos    dword ptr es:[edi]  
013E1815  pop         edi  
     7:      {
     8:          *p = 0;
     9:      }
    10: }
013E1816  ret 

    13: void myIndex(int a[], int size)
    14: {
    15:      int i;
    16:      for(i = 0; i < size; i++)
013E17F0  test        ecx,ecx  
013E17F2  jle         myIndex+0Ch (13E17FCh)  
013E17F4  push        edi  
013E17F5  xor         eax,eax  
013E17F7  mov         edi,edx  
013E17F9  rep stos    dword ptr es:[edi]  
013E17FB  pop         edi  
    17:      {
    18:          a[i] = 0;
    19:      }
    20: }
013E17FC  ret 

At a glance, myIndex looks faster because the number of instructions are less, however, the two pieces of the code are essentially the same. Both eventually use rep stos, which is a x86's repeating (loop) instruction. The only difference is because of the computation of the loop bound. The for loop in myIndex has the trip count size as it is (i.e., no computation is needed). But, myPointer needs some computation to get the trip count of the for loop. This is the only difference. The important loop operations are just the same. Thus, the difference is negligible.

To summarize, the performance of myPointer and myIndex in an optimized code should be identical.


FYI, if the array's bound is known at compile time, e.g., int A[constant_expression], then the accesses on this array may be much faster than the pointer one. This is mostly because the array accesses are free from the pointer analysis problem. Compilers can perfectly compute the dependency information on computations and accesses on a fixed-size array, so it can do advanced optimizations including automatic parallelization.

However, if computations are pointer based, compilers must perform pointer analysis for further optimization, which is pretty much limited in C/C++. It generally ends up with conservative results on pointer analysis and results in a few optimization opportunity.

Upvotes: 10

Anthony Blake
Anthony Blake

Reputation: 5348

I would suggest running each loop 200 million times, and then run each loop 10 times, and take the fastest measurement. That will factor out effects from OS scheduling and so on.

I would then suggest you disassemble the code for each loop.

Upvotes: 2

djhaskin987
djhaskin987

Reputation: 10057

The times are so close together that if you did them repeatedly, you may not see much of a difference. Both code segments compile to the exact same assembly. By definition, there is no difference.

Upvotes: 0

Ron
Ron

Reputation: 1315

It may be the comparison in the for loop that is causing the difference. The termination condition is tested on each iteration, and your "pointer" example has a slightly more complicated termination condition (taking the address of &a[size]). Since &a[size] does not change, you could try setting it to a variable to avoid recalculating it on each iteration of the loop.

Upvotes: 5

moshbear
moshbear

Reputation: 3322

Array dereference p[i] is *(p + i). Compilers make use of instructions that do math + dereference in 1 or 2 cycles (e.g. x86 LEA instruction) to optimize for speed.

With the pointer loop, it splits the access and offset into to separate parts and the compiler cannot optimize it.

Upvotes: 5

Related Questions