OrenIshShalom
OrenIshShalom

Reputation: 7172

Order of global variables in ELF

I have a simple C program with four global variables:

$ cat example.c
int x;
int y;
int z;
int w;

int main()
{
    x = 5;
    y = 6;
    z = 7;
    w = 8;

    return x+y+z+w;
}

When I looked at their location in the ELF file I was surprised because they were not organized according to their declarations: x, y, z, w. Instead, it was z (0x60102c), x(0x601030), w(0x601034), y(0x601038):

$ clang -g -O0 -o example example.c
$ objdump -S example | cat -n | sed -n '100,123p;124q'
   100  int main()
   101  {
   102    400460:   55                      push   %rbp
   103    400461:   48 89 e5                mov    %rsp,%rbp
   104    400464:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
   105      x = 5;
   106    40046b:   c7 04 25 30 10 60 00    movl   $0x5,0x601030
   107    400472:   05 00 00 00 
   108      y = 6;
   109    400476:   c7 04 25 38 10 60 00    movl   $0x6,0x601038
   110    40047d:   06 00 00 00 
   111      z = 7;
   112    400481:   c7 04 25 2c 10 60 00    movl   $0x7,0x60102c
   113    400488:   07 00 00 00 
   114      w = 8;
   115    40048c:   c7 04 25 34 10 60 00    movl   $0x8,0x601034
   116    400493:   08 00 00 00 
   117  
   118      return x+y+z+w;
   119    400497:   8b 04 25 30 10 60 00    mov    0x601030,%eax
   120    40049e:   03 04 25 38 10 60 00    add    0x601038,%eax
   121    4004a5:   03 04 25 2c 10 60 00    add    0x60102c,%eax
   122    4004ac:   03 04 25 34 10 60 00    add    0x601034,%eax
   123    4004b3:   5d                      pop    %rbp

Is it just arbitrary? Is there a specific reason to organize them not according to their declaration? thanks!

Upvotes: 2

Views: 494

Answers (1)

Florian Weimer
Florian Weimer

Reputation: 33757

You are using tentative definitions (without non-zero initializers), so the compiler does not actually determine the data layout. There could be a definition in a file somewhere (perhaps written in assembler) which imposes a completely different order than the compiler produces in the assembler file, and the link editor would then be forced to allocate the objects in a specific order in the output section.

In my case, Clang actually produces this:

    .type   x,@object               # @x
    .comm   x,4,4
    .type   y,@object               # @y
    .comm   y,4,4
    .type   z,@object               # @z
    .comm   z,4,4
    .type   w,@object               # @w
    .comm   w,4,4

The external assembler (from GNU binutils) turns this into (as shown by eu-readelf -s; readelf -sW should work equally well):

   18: 0000000000000004      4 OBJECT  GLOBAL DEFAULT   COMMON x
   19: 0000000000000004      4 OBJECT  GLOBAL DEFAULT   COMMON y
   20: 0000000000000004      4 OBJECT  GLOBAL DEFAULT   COMMON z
   21: 0000000000000004      4 OBJECT  GLOBAL DEFAULT   COMMON w

(COMMON because of the tentative definition.)

The internal assembler in Clang itself produces:

    8: 0000000000000004      4 OBJECT  GLOBAL DEFAULT   COMMON w
    9: 0000000000000004      4 OBJECT  GLOBAL DEFAULT   COMMON x
   10: 0000000000000004      4 OBJECT  GLOBAL DEFAULT   COMMON y
   11: 0000000000000004      4 OBJECT  GLOBAL DEFAULT   COMMON z

On my system, BFD ld from binutils turns that into:

   54: 000000000060102c      4 OBJECT  GLOBAL DEFAULT       23 z
   55: 0000000000601030      4 OBJECT  GLOBAL DEFAULT       23 x
   65: 0000000000601034      4 OBJECT  GLOBAL DEFAULT       23 w
   66: 0000000000601038      4 OBJECT  GLOBAL DEFAULT       23 y

Curiously, gold from the same binutils (2.28) version produces:

   25: 0000000000402014      4 OBJECT  GLOBAL DEFAULT       24 w
   26: 0000000000402020      4 OBJECT  GLOBAL DEFAULT       24 z
   27: 000000000040201c      4 OBJECT  GLOBAL DEFAULT       24 y
   28: 0000000000402018      4 OBJECT  GLOBAL DEFAULT       24 x

My best guess is that in BFD ld's case, it just happens to be some hash table iteration order, and gold uses lexicographic symbol ordering.

Note that most of this happens because of tentative definitions and common symbols. The assembler and link editor are not permitted to reorder regular data object definitions within the same section, so if you disable the use of common symbols, you will get whatever the compiler produces in the assembler output. The object definition order is still not defined by the language standard, but you can check the compiler manual if it makes any additional guarantees.

Upvotes: 3

Related Questions