cperformancestructcompiler-optimizationpass-by-value

Reputation: 71

Efficiency of structs of length 8, and uint64_t

TLDR: are 8-byte structures handled just as efficiently as 8-byte uint64_t?

I have 3 data structures that are very similar. They are 6, 7 and 8 bytes long. I want to put them all in uint64_t variables. The goal is that comparisons and assignments will be very efficient. (These values are used as key in several (large) trees).

Example: I have defined the following data-structure, for the one that is 7 bytes long.

typedef struct {
  union {
    uint64_t raw;
    struct {
      uint8_t unused;
      uint8_t node_number;
      uint8_t systemid[SYSTEMID_LENGTH]; /* SYSTEMID_LENGTH is 6 bytes. */
    } nodeid;
  };
} nodeid_t;

I can now do quick assignments and copies via the raw union member.

nodeid_t x, y;

x.raw = y.raw
if (x.raw > y.raw) {

Etc, etc.

My question is about use in functions and in assignments. When I pass this struct by value, will the compiler (gcc) recognize that these structures are 8 bytes long. And therefore treat as if they are int64_t?

Example: Will there be efficiency/performance differences between:

int64_t my_function();
nodeid_t my_function();

In other words, will gcc use 1 instruction to put a nodeid_t on the stack, as if it was an integer? Or will it create a loop, and copy 8 bytes one by one? Does this depend on -O optimization?

Same question for assignment.

int64_t a, b;
nodeid_t x, y;

a = b; /* One machine instruction, I hope. */
x = y; /* Also one instruction, or will it do a loop ? */

Upvotes: 4

Answers (6)

Art

Reputation: 20402

It depends on your architecture. But assuming that you're on x86_64 (which is the most likely), you don't need to do the union hack for copying and function arguments (you'd still need it for comparisons).

struct foo {
    char a;
    char b;
    short c;
    int d;
};

void
foo_copy(struct foo *a, struct foo *b)
{
    *a = *b;
}


extern void bar(struct foo a);
void
foo_value(void)
{
    struct foo f = { .a = 1 };
    bar(f);
}
$ cc -fomit-frame-pointer -O2 -S foo.c
$ cat foo.s
[... cleaned up ...]
_foo_copy:                              ## @foo_copy
    movq    (%rsi), %rax
    movq    %rax, (%rdi)
    retq

_foo_value:                             ## @foo_value
    movl    $1, %edi
    jmp _bar                    ## TAILCALL

Different architectures will have different requirements, a strict alignment architecture for example wouldn't be able to do the copy unless the ABI requires larger than usual alignment. Other ABIs might have different calling conventions for structs. So this is hard to answer generally. But if you're on x86_64, you probably either don't need to waste time doing this optimization, or if you want the comparisons to be efficient, this will work like you want.

Upvotes: 1

doron

Reputation: 28932

Instead of all this fancy footwork, why not simply use memcmp, this works for any contiguous data-type is probably implemented as a compile intrinsic so should be fast and definitely properly sidesteps the strict aliasing rule.

Upvotes: 0

Gryz

Reputation: 71

I am now inclined to define my 3 datastructures simply as typedef uint64_t.

typedef uint64_t isis_simple_item_t;
typedef struct isis_complex_item_t {
  byte_t unused;
  byte_t node_number;
  byte_t systemid[ISIS_SYSTEMID_SIZE];
};

byte_t number;
isis_simple_item_t nodeid;

number = ((isis_complex_item) nodeid).node_number;

This way I can do quick compares, assignement, function returns, function parameters-by-value, etc.

And then when I need to access one of the members inside the struct, which happens a lot less, I'll use wrapper functions. With casts inside them, from uint64_t to the more complex struct. That also means I don't need the union anymore.

Upvotes: 0

yb303

Reputation: 1439

Portability aside, if you're after the ultimate low latency, you're on the right track. I've been doing the same for many years.
A few things to note though:
1. Your code, with chars only, should work as is, because the alignment requirement for char is 1.
2. With wider types you'd need to pack struct nodeid. In gcc you do it with __attribute__((packed)). I think MSVC uses #pragma push pack(1)...#pragma pop.
3. Gcc used to have bugs around this area (gaps between bit fields, wrong alignment...) so I strongly suggest using compile-time checks, like STATIC_ASSERT(sizeof(nodeid_t) == sizeof(uint64_t))
4. If some of the 8 bytes are not populated, make sure you put zeros or something in them. Otherwise your comparisons etc would use random values.

Upvotes: 1

Bathsheba

Reputation: 234865

You cannot be certain that the union has the same size as uint64_t.

This is due to packing in the nodeid struct: compilers will often insert gaps between struct members. Some compilers allow you to change the packing arrangements but then your code will not be portable.

It would be safer to have an array of uint8_t: then the memory would be contiguous.

A compiler will then simply copy the memory on assignment, so you may as well use nodeid_t as your function return types.

Your second job is to rename nodeid_t: _t suffixes are reserved in POSIX C.

Upvotes: 1

dbush

Reputation: 225007

For something like this, efficiency will likely not be a concern.

That said, this will probably not do what you intend:

if (x.raw > y.raw) {

If you're running on a machine with a little-endian architecture, the least significant byte is stored first. If that's the case, then if for example you have this:

x.nodeid.systemid[0] = 1;
x.nodeid.systemid[1] = 2;
y.nodeid.systemid[0] = 2;
y.nodeid.systemid[1] = 1;

Then (x.raw > y.raw) would evaluate to true.

Upvotes: 0

Efficiency of structs of length 8, and uint64_t

TLDR: are 8-byte structures handled just as efficiently as 8-byte uint64_t?

Answers (6)

Related Questions