Reputation: 461
In C (or possibly in general) is it faster to use arithmetic to get a value, or call it from an array/variable?
For example, if I had
int myarray[7] = {16};
int mysixteen = 16;
then I can get 16 in a number of different ways
myarray[#]
mysixteen
16
1 << 4
10 + 6
Logically 16 would be the fastest, but that's not always convenient or plausible for a set of numbers. An example of where this might be relevant is precomputing tables. Say you need bitmasks for 64 bits, you could fill an array
for (int i = 0; i < 64; ++i) {
mask[i] = 1 << i;
}
and make calls to the array in the future, or make a macro
#define mask(b) (1 << b)
and call that.
Upvotes: 2
Views: 243
Reputation: 94319
In general, any of
Will result in a literal 16
because the compiler most certainly implements an optimization called constant folding.
The performance of
is probably lower depending on where the value of those variables is stored. In memory? If so, is the memory in any of the CPU caches? Or is it stored in one of the CPU registers? There's no definitive answer.
I general, for a specific program, you can always see what your compiler gives you - but note that this may change a lot depending on surrounding code and your optimization flags.
To try it yourself, consider this small program:
int f() { return 16; }
int g() { return 1 << 4; }
int h() { return 10 + 6; }
int i() {
int myarray[7] = { 16 };
return myarray[3];
}
int j() {
int mysixteen = 16;
return mysixteen;
}
If I compile it using gcc 4.7.2 and then check the disassembly, like
$ gcc -c so19802742.c -o so19802742.o
$ objdump --disassemble so19802742.o
I get this:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 10 00 00 00 mov $0x10,%eax
9: 5d pop %rbp
a: c3 retq
000000000000000b <g>:
b: 55 push %rbp
c: 48 89 e5 mov %rsp,%rbp
f: b8 10 00 00 00 mov $0x10,%eax
14: 5d pop %rbp
15: c3 retq
0000000000000016 <h>:
16: 55 push %rbp
17: 48 89 e5 mov %rsp,%rbp
1a: b8 10 00 00 00 mov $0x10,%eax
1f: 5d pop %rbp
20: c3 retq
0000000000000021 <i>:
21: 55 push %rbp
22: 48 89 e5 mov %rsp,%rbp
25: 48 c7 45 e0 00 00 00 movq $0x0,-0x20(%rbp)
2c: 00
2d: 48 c7 45 e8 00 00 00 movq $0x0,-0x18(%rbp)
34: 00
35: 48 c7 45 f0 00 00 00 movq $0x0,-0x10(%rbp)
3c: 00
3d: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp)
44: c7 45 e0 10 00 00 00 movl $0x10,-0x20(%rbp)
4b: 8b 45 ec mov -0x14(%rbp),%eax
4e: 5d pop %rbp
4f: c3 retq
0000000000000050 <j>:
50: 55 push %rbp
51: 48 89 e5 mov %rsp,%rbp
54: c7 45 fc 10 00 00 00 movl $0x10,-0x4(%rbp)
5b: 8b 45 fc mov -0x4(%rbp),%eax
5e: 5d pop %rbp
5f: c3 retq
Note how due to constant folding, f
, g
and h
yield exactly the same machine code. The array access in i
causes the most machine code (but not necessarily the slowest!) and j
is kind of inbetween.
However, this is without any more complicated code optimizations at all! The code generated when compiling with e.g. -O2
may be totally different because the compiler notices that calls to any of the five functions are equivalent to just using the constant 16
!
Upvotes: 2
Reputation: 9062
You should not worry about these things. The compiler is smart enough in most cases. Even basic operations like multiply are sometimes optimized to use shifts, as it is more efficient this way.
Speaking about your example, the array version would require a lot of memory access, which is very slow. The macro will be faster in the most cases, depending on the number of accesses.
Upvotes: 0