Dakota West
Dakota West

Reputation: 461

Is using arithmetic faster than storing a variable?

In C (or possibly in general) is it faster to use arithmetic to get a value, or call it from an array/variable?

For example, if I had

int myarray[7] = {16};
int mysixteen = 16;

then I can get 16 in a number of different ways

myarray[#]
mysixteen
16
1 << 4
10 + 6

Logically 16 would be the fastest, but that's not always convenient or plausible for a set of numbers. An example of where this might be relevant is precomputing tables. Say you need bitmasks for 64 bits, you could fill an array

for (int i = 0; i < 64; ++i) {
    mask[i] = 1 << i;
}

and make calls to the array in the future, or make a macro

#define mask(b) (1 << b)

and call that.

Upvotes: 2

Views: 243

Answers (2)

Frerich Raabe
Frerich Raabe

Reputation: 94319

In general, any of

  • 16
  • 1 << 4
  • 10 + 6

Will result in a literal 16 because the compiler most certainly implements an optimization called constant folding.

The performance of

  • mysixteen
  • myarray[n]

is probably lower depending on where the value of those variables is stored. In memory? If so, is the memory in any of the CPU caches? Or is it stored in one of the CPU registers? There's no definitive answer.

I general, for a specific program, you can always see what your compiler gives you - but note that this may change a lot depending on surrounding code and your optimization flags.

To try it yourself, consider this small program:

int f() { return 16; }

int g() { return 1 << 4; }

int h() { return 10 + 6; }

int i() {
    int myarray[7] = { 16 };
    return myarray[3];
}

int j() {
    int mysixteen = 16;
    return mysixteen;
}

If I compile it using gcc 4.7.2 and then check the disassembly, like

$ gcc -c so19802742.c -o so19802742.o
$ objdump --disassemble so19802742.o

I get this:

0000000000000000 <f>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 10 00 00 00          mov    $0x10,%eax
   9:   5d                      pop    %rbp
   a:   c3                      retq   

000000000000000b <g>:
   b:   55                      push   %rbp
   c:   48 89 e5                mov    %rsp,%rbp
   f:   b8 10 00 00 00          mov    $0x10,%eax
  14:   5d                      pop    %rbp
  15:   c3                      retq   

0000000000000016 <h>:
  16:   55                      push   %rbp
  17:   48 89 e5                mov    %rsp,%rbp
  1a:   b8 10 00 00 00          mov    $0x10,%eax
  1f:   5d                      pop    %rbp
  20:   c3                      retq   

0000000000000021 <i>:
  21:   55                      push   %rbp
  22:   48 89 e5                mov    %rsp,%rbp
  25:   48 c7 45 e0 00 00 00    movq   $0x0,-0x20(%rbp)
  2c:   00 
  2d:   48 c7 45 e8 00 00 00    movq   $0x0,-0x18(%rbp)
  34:   00 
  35:   48 c7 45 f0 00 00 00    movq   $0x0,-0x10(%rbp)
  3c:   00 
  3d:   c7 45 f8 00 00 00 00    movl   $0x0,-0x8(%rbp)
  44:   c7 45 e0 10 00 00 00    movl   $0x10,-0x20(%rbp)
  4b:   8b 45 ec                mov    -0x14(%rbp),%eax
  4e:   5d                      pop    %rbp
  4f:   c3                      retq   

0000000000000050 <j>:
  50:   55                      push   %rbp
  51:   48 89 e5                mov    %rsp,%rbp
  54:   c7 45 fc 10 00 00 00    movl   $0x10,-0x4(%rbp)
  5b:   8b 45 fc                mov    -0x4(%rbp),%eax
  5e:   5d                      pop    %rbp
  5f:   c3                      retq   

Note how due to constant folding, f, g and h yield exactly the same machine code. The array access in i causes the most machine code (but not necessarily the slowest!) and j is kind of inbetween.

However, this is without any more complicated code optimizations at all! The code generated when compiling with e.g. -O2 may be totally different because the compiler notices that calls to any of the five functions are equivalent to just using the constant 16!

Upvotes: 2

Paul92
Paul92

Reputation: 9062

You should not worry about these things. The compiler is smart enough in most cases. Even basic operations like multiply are sometimes optimized to use shifts, as it is more efficient this way.

Speaking about your example, the array version would require a lot of memory access, which is very slow. The macro will be faster in the most cases, depending on the number of accesses.

Upvotes: 0

Related Questions