Is a union more efficient than a shift on modern compilers?

Consider the simple code:

UINT64 result;
UINT32 high, low;
...
result = ((UINT64)high << 32) | (UINT64)low;

Do modern compilers turn that into a real barrel shift on high, or optimize it to a simple copy to the right location?

If not, then using a union would seem to be more efficient than the shift that most people appear to use. However, having the compiler optimize this is the ideal solution.

I'm wondering how I should advise people when they do require that extra little bit of performance.

Upvotes: 7

Answers (4)

Matthew

Reputation: 48304

I wrote the following (hopefully valid) test:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

void func(uint64_t x);

int main(int argc, char **argv)
{
#ifdef UNION
  union {
    uint64_t full;
    struct {
      uint32_t low;
      uint32_t high;
    } p;
  } result;
  #define value result.full
#else
  uint64_t result;
  #define value result
#endif
  uint32_t high, low;

  if (argc < 3) return 0;

  high = atoi(argv[1]);
  low = atoi(argv[2]);

#ifdef UNION
  result.p.high = high;
  result.p.low = low;
#else
  result = ((uint64_t) high << 32) | low;
#endif

  // printf("%08x%08x\n", (uint32_t) (value >> 32), (uint32_t) (value & 0xffffffff));
  func(value);

  return 0;
}

Running a diff of the unoptimized output of gcc -s:

<   mov -4(%rbp), %eax
<   movq    %rax, %rdx
<   salq    $32, %rdx
<   mov -8(%rbp), %eax
<   orq %rdx, %rax
<   movq    %rax, -16(%rbp)
---
>   movl    -4(%rbp), %eax
>   movl    %eax, -12(%rbp)
>   movl    -8(%rbp), %eax
>   movl    %eax, -16(%rbp)

I don't know assembly, so it's hard for me to analyze that. However, it looks like some shifting is taking place as expected on the non-union (top) version.

But with optimizations -O2 enabled, the output was identical. So the same code was generated and both ways will have the same performance.

(gcc version 4.5.2 on Linux/AMD64)

Partial output of optimized -O2 code with or without union:

    movq    8(%rsi), %rdi
    movl    $10, %edx
    xorl    %esi, %esi
    call    strtol

    movq    16(%rbx), %rdi
    movq    %rax, %rbp
    movl    $10, %edx
    xorl    %esi, %esi
    call    strtol

    movq    %rbp, %rdi
    mov     %eax, %eax
    salq    $32, %rdi
    orq     %rax, %rdi
    call    func

The snippet begins immediately after the jump generated by the if line.

Upvotes: 4

Jason

Reputation: 32540

EDIT: This response is based on an earlier version of the OP's code that did not have a cast

This code

result = (high << 32) | low;

is actually going to have undefined results ... since with high you're shifting a 32-bit value by 32-bits (the width of the value), the results are going to be undefined and will depend on how a compiler and OS platform decide to handle the shift. The results of that undefined shift will then be or'd with low, which again will be undefined since you're or'ing an undefined value against a defined value, and so the end-result will most likely not be a 64-bit value like you want. For instance, the code emitted by gcc -s on OSX 10.6 looks like:

movl    -4(%rbp), %eax      //retrieving the value of "high"
movl    $32, %ecx          
shal    %cl, %eax           //performing the 32-bit shift on "high"
orl    -8(%rbp), %eax       //OR'ing the value of "low" to the shift op result

So you can see that the shift is only taking place on a 32-bit value in a 32-bit register with a 32-bit assembly command ... the results end up being the exact same as high | low without any shifting at all because in this case, shal $32, %eax just returns the value that was originally in EAX. You're not getting a 64-bit result.

In order to avoid that, cast high to a uint64_t like:

result = ((uint64_t)high << 32) | low;

Upvotes: 2

c-smile

Reputation: 27470

If this supposed to be platform independent then the only option is to use shifts here.

With union { r64; struct{low;high}} you cannot tell on what low/high fields will map to. Think about endianess.

Modern compilers are pretty good handling such shifts.

Upvotes: 4

fortran

Reputation: 76127

Modern compilers are smarter than what you might think ;-) (so yes, I think you can expect a barrel shift on any decent compiler).

Anyway, I would use the option that has a semantic closer to what you are actually trying to do.

Upvotes: 4

Is a union more efficient than a shift on modern compilers?

Answers (4)

Related Questions