Steve Jobs
Steve Jobs

Reputation: 181

What is a good general rule for when to use pointers for arrays versus using access operators?

Access operator example:

void print_elements ( int * arr, int n )
{
    for (int k = 0; k < n; ++k) printf("%d\n", arr[k]);
}

Pointer arithmetic example:

void print_elements ( int * arr, int n )
{
    for (int * pa(arr), * pb(arr+n); pa != pb; ++pa) printf("%d\n", *pa);
}

The second involves 2 variables for the loop, pa and pb, whereas the first only involves 1 extra variable, k. Then again, in the second you're doing

increment pa by 1, dereference

whereas in the first one you're doing

increment k by 1, add k to the pointer arr, derefence

so that the first one uses fewer operations to iterate through the loop.

All things considered, which is faster? Is the comparator < faster than != for integral types? Using < seems "safer" since there's always that irrational fear that having something like pa != pb as a condition fail because pa could jump over pb if the algorithm inside the loop is reconstructed. Is incrementing a pointer by 1 generally faster than incrementing it by something greater than 1? I want to consider everything possible.

Upvotes: 0

Views: 91

Answers (3)

alk
alk

Reputation: 70981

As others pointed out, performance issues do not lead to us the one or the other notation in accessing array elements.

If there is more then one way to express the same thing, use the way which is the one to be understood better.

A matter of taste again.

I prefer the index operator [] and do a[1] as long I do not need to take the address of an array's element. In this latter case I use the +-operator, that is a + 1 instead of &a[1].

Upvotes: 1

Lundin
Lundin

Reputation: 215305

What is a good general rule for when to use pointers for arrays versus using access operators?

Always use array indexing syntax whenever possible, because it is more readable. The pointer arithmetic syntax is far harder to read and should generally not be used for iterations.

The second involves 2 variables

The number of variables used in the source code is a very poor measurement of performance and memory consumption. The actual machine code will need to store results somewhere, whether you declare variables explicitly or not. And if you declare more variables than the machine code actually needs, the compiler will most likely optimize them away.

The loop will need to know when to end the iteration. It can do that either by calculating arr+n in runtime every lap in the loop (unlikely to happen because it will be slow) or it can do that by saving arr+n in a temporary memory location, before starting the loop. In the first example, you declared no such variable, so there will likely be an unnamed variable in the actual machine code for this purpose. Making both examples equivalent.

so that the first one uses fewer operations to iterate through the loop

Not really, no. The C standard enforces that arr[i] is 100% equivalent to *(arr + i). The array syntax is just "syntactic sugar". It is extremely likely that both cases will generate identical machine code.

(The above equivalence rule is the reason why C allows some weird, ugly crap)

All things considered, which is faster?

They are equally fast, for the above mentioned reason.

Is the comparator < faster than != for integral types?

Generally there should be no difference. It all boils down to which assembler instructions that are available on the given CPU. All manner of comparisons are most likely equally fast, except those that compare against 0, they may save you a few nanoseconds on some CPUs.

I want to consider everything possible.

Then I would strongly recommend you to consider the readability and maintenance of your program, instead of manual micro-optimizations. The former is what makes a good programmer today, not the latter. We aren't in the 1980s any longer.

Upvotes: 3

uesp
uesp

Reputation: 6204

This would fall into the area of "premature optimization". Unless you have a specific (measured) reason you should generally prefer the simplest and most straight forward code to begin with (your first functions in this case). Chances are that any compiler these days will optimize both functions to be roughly the same anyways, if not exactly the same.

If you suspect some optimizations can be made you should measure things firsts by benchmarking/profiling, although even in a simple function like this you have to be careful as you can easily get false results.

If we replace your functions with the following:

volatile int Output = 0;

void print_elements1(int * arr, int n)
{
    for (int k = 0; k < n; ++k) Output += arr[k];
}

void print_elements2(int * arr, int n)
{
    for (int * pa(arr), *pb(arr + n); pa != pb; ++pa) Output += *pa;
}

The reason is that printf() is "slow" and we would really be only testing its speed if we benchmarked your original functions. Testing these functions with an array size of 100 million to get decent/repeatable timings we get:

  • print_elements1() = 560 ms
  • print_elements2() = 230 ms

Ah, we see that the pointer access is more than twice as fast as the array index....but not fast! Lets reverse the test order and see what we get:

  • print_elements2() = 650 ms
  • print_elements1() = 260 ms

Huh...now the pointer access is twice as slow! Whats going on? I don't exactly know but it likely has to due to CPU/memory cache. We can try to get rid of this effect by running both functions before the benchmarks:

  • print_elements1() = 230 ms
  • print_elements2() = 230 ms

Identical times, at least within the margin of error of my benchmark timer.

The moral of the story is that compilers and computers these days are complicated machines and chances are that your compiler will do a better job at optimizing most code than you. When you do optimize, measure things first via profiling/benchmarking in order to determine the most effective areas of the code to work on (if any).

Upvotes: 3

Related Questions