Reputation: 93615
Consider the simple code:
UINT64 result;
UINT32 high, low;
...
result = ((UINT64)high << 32) | (UINT64)low;
Do modern compilers turn that into a real barrel shift on high, or optimize it to a simple copy to the right location?
If not, then using a union would seem to be more efficient than the shift that most people appear to use. However, having the compiler optimize this is the ideal solution.
I'm wondering how I should advise people when they do require that extra little bit of performance.
Upvotes: 7
Views: 2530
Reputation: 48304
I wrote the following (hopefully valid) test:
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
void func(uint64_t x);
int main(int argc, char **argv)
{
#ifdef UNION
union {
uint64_t full;
struct {
uint32_t low;
uint32_t high;
} p;
} result;
#define value result.full
#else
uint64_t result;
#define value result
#endif
uint32_t high, low;
if (argc < 3) return 0;
high = atoi(argv[1]);
low = atoi(argv[2]);
#ifdef UNION
result.p.high = high;
result.p.low = low;
#else
result = ((uint64_t) high << 32) | low;
#endif
// printf("%08x%08x\n", (uint32_t) (value >> 32), (uint32_t) (value & 0xffffffff));
func(value);
return 0;
}
Running a diff of the unoptimized output of gcc -s
:
< mov -4(%rbp), %eax
< movq %rax, %rdx
< salq $32, %rdx
< mov -8(%rbp), %eax
< orq %rdx, %rax
< movq %rax, -16(%rbp)
---
> movl -4(%rbp), %eax
> movl %eax, -12(%rbp)
> movl -8(%rbp), %eax
> movl %eax, -16(%rbp)
I don't know assembly, so it's hard for me to analyze that. However, it looks like some shifting is taking place as expected on the non-union (top) version.
But with optimizations -O2
enabled, the output was identical. So the same code was generated and both ways will have the same performance.
(gcc version 4.5.2 on Linux/AMD64)
Partial output of optimized -O2
code with or without union:
movq 8(%rsi), %rdi
movl $10, %edx
xorl %esi, %esi
call strtol
movq 16(%rbx), %rdi
movq %rax, %rbp
movl $10, %edx
xorl %esi, %esi
call strtol
movq %rbp, %rdi
mov %eax, %eax
salq $32, %rdi
orq %rax, %rdi
call func
The snippet begins immediately after the jump generated by the if
line.
Upvotes: 4
Reputation: 32540
EDIT: This response is based on an earlier version of the OP's code that did not have a cast
This code
result = (high << 32) | low;
is actually going to have undefined results ... since with high
you're shifting a 32-bit value by 32-bits (the width of the value), the results are going to be undefined and will depend on how a compiler and OS platform decide to handle the shift. The results of that undefined shift will then be or'd with low
, which again will be undefined since you're or'ing an undefined value against a defined value, and so the end-result will most likely not be a 64-bit value like you want. For instance, the code emitted by gcc -s
on OSX 10.6 looks like:
movl -4(%rbp), %eax //retrieving the value of "high"
movl $32, %ecx
shal %cl, %eax //performing the 32-bit shift on "high"
orl -8(%rbp), %eax //OR'ing the value of "low" to the shift op result
So you can see that the shift is only taking place on a 32-bit value in a 32-bit register with a 32-bit assembly command ... the results end up being the exact same as high | low
without any shifting at all because in this case, shal $32, %eax
just returns the value that was originally in EAX
. You're not getting a 64-bit result.
In order to avoid that, cast high
to a uint64_t
like:
result = ((uint64_t)high << 32) | low;
Upvotes: 2
Reputation: 27470
If this supposed to be platform independent then the only option is to use shifts here.
With union { r64; struct{low;high}}
you cannot tell on what low/high fields will map to. Think about endianess.
Modern compilers are pretty good handling such shifts.
Upvotes: 4
Reputation: 76127
Modern compilers are smarter than what you might think ;-) (so yes, I think you can expect a barrel shift on any decent compiler).
Anyway, I would use the option that has a semantic closer to what you are actually trying to do.
Upvotes: 4