Why is using a pointer for a for loop more performant in this case?

Question

I don't have a background in C/C++ or related lower-level languages and so I've never ran into pointers before. I'm a game dev working primarily in C# and I finally decided to move to an unsafe context this morning for some performance-critical sections of code (and please no "don't use unsafe" answers as I've read so many times while doing research, as it's already yielding me around 6 times the performance in certain areas, with no issues so far, plus I love the ability to do stuff like reverse arrays with no allocation). Anyhow, there's a certain situation where I expected no difference, or even a possible decrease in speed, and I'm saving a lot of ticks in reality (I'm talking about double the speed in some instances). This benefit seems to decrease with the number of iterations, which I don't fully understand.

This is the situation:

int x = 0;
for(int i = 0; i < 100; i++)
    x++;

Takes, on average about 15 ticks.

EDIT: The following is unsafe code, though I assumed that was a given.

int x = 0, i = 0;
int* i_ptr;
for(i_ptr = &i; *i_ptr < 100; (*i_ptr)++)
    x++;

Takes about 7 ticks, on average.

As I mentioned, I don't have a low-level background and I literally just started using pointers this morning, at least directly, so I'm probably missing quite a bit of info. So my first query is- why is the pointer more performant in this case? It isn't an isolated instance, and there are a lot of other variables of course, at that specific point in time in relation to the PC, but I'm getting these results very consistently across a lot of tests.

In my head, the operations are as such:

No pointer:

Get address of i
Get value at address

Pointer:

Get address of i_ptr
Get address of i from i_ptr
Get value at address

In my head, there must surely be more overhead, however ridiculously negligible, from using a pointer here. How is it that a pointer is consistently more performant than the direct variable in this case? These are all on the stack as well, of course, so it's not dependent on where they end up being stored, from what I can tell.

As touched on earlier, the caveat is that this bonus decreases with the number of iterations, and pretty fast. I took out the extremes from the following data to account for background interference.

At 1000 iterations, they are both identical at 30 to 34 ticks.

At 10000 iterations, the pointer is slower by about 20 ticks.

Jump up to 10000000 iterations, and the pointer is slower by about 10000 ticks or so.

My assumption is that the decrease comes from the extra step I covered earlier, given that there is an additional lookup, which brings me back to wonder why it's more performant with a pointer than without at low loop counts. At the very least, I'd assume they would be more or less identical (which they are in practice, I suppose, but a difference of 8 ticks from millions of repeated tests is pretty definitive to me) up until the very rough threshold I found somewhere between 100 and 1000 iterations.

Apologies if I'm nitpicking somewhat, or if this is a poor question, but I feel as though it will be beneficial to know exactly what is going on under the hood. And if nothing else, I think it's pretty interesting!

Josh Alexander · Accepted Answer

Some users suggested that the test results were most likely due to measurement inaccuracies, and it would seem as such, at least upto a point. When averaged across ten million continuous tests, the mean of both is typically equal, though in some cases the use of pointers averages out to an extra tick. Interestingly, when testing as a single case, the use of pointers has a consistently lower execution time than without. There are of course a lot of additional variables at play at the specific points in time at which a test is tried, which makes it somewhat of a pointless pursuit to track this down any further. But the result is that I've learned some more about pointers, which was my primary goal, and so I'm pleased with the test.

Why is using a pointer for a for loop more performant in this case?

Answers (1)

Related Questions