Reputation: 7172
I have a simple C program with four global variables:
$ cat example.c
int x;
int y;
int z;
int w;
int main()
{
x = 5;
y = 6;
z = 7;
w = 8;
return x+y+z+w;
}
When I looked at their location in the ELF file I was surprised because they were not organized according to their declarations:
x, y, z, w
. Instead, it was z (0x60102c), x(0x601030), w(0x601034), y(0x601038)
:
$ clang -g -O0 -o example example.c
$ objdump -S example | cat -n | sed -n '100,123p;124q'
100 int main()
101 {
102 400460: 55 push %rbp
103 400461: 48 89 e5 mov %rsp,%rbp
104 400464: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
105 x = 5;
106 40046b: c7 04 25 30 10 60 00 movl $0x5,0x601030
107 400472: 05 00 00 00
108 y = 6;
109 400476: c7 04 25 38 10 60 00 movl $0x6,0x601038
110 40047d: 06 00 00 00
111 z = 7;
112 400481: c7 04 25 2c 10 60 00 movl $0x7,0x60102c
113 400488: 07 00 00 00
114 w = 8;
115 40048c: c7 04 25 34 10 60 00 movl $0x8,0x601034
116 400493: 08 00 00 00
117
118 return x+y+z+w;
119 400497: 8b 04 25 30 10 60 00 mov 0x601030,%eax
120 40049e: 03 04 25 38 10 60 00 add 0x601038,%eax
121 4004a5: 03 04 25 2c 10 60 00 add 0x60102c,%eax
122 4004ac: 03 04 25 34 10 60 00 add 0x601034,%eax
123 4004b3: 5d pop %rbp
Is it just arbitrary? Is there a specific reason to organize them not according to their declaration? thanks!
Upvotes: 2
Views: 494
Reputation: 33757
You are using tentative definitions (without non-zero initializers), so the compiler does not actually determine the data layout. There could be a definition in a file somewhere (perhaps written in assembler) which imposes a completely different order than the compiler produces in the assembler file, and the link editor would then be forced to allocate the objects in a specific order in the output section.
In my case, Clang actually produces this:
.type x,@object # @x
.comm x,4,4
.type y,@object # @y
.comm y,4,4
.type z,@object # @z
.comm z,4,4
.type w,@object # @w
.comm w,4,4
The external assembler (from GNU binutils) turns this into (as shown by eu-readelf -s
; readelf -sW
should work equally well):
18: 0000000000000004 4 OBJECT GLOBAL DEFAULT COMMON x
19: 0000000000000004 4 OBJECT GLOBAL DEFAULT COMMON y
20: 0000000000000004 4 OBJECT GLOBAL DEFAULT COMMON z
21: 0000000000000004 4 OBJECT GLOBAL DEFAULT COMMON w
(COMMON
because of the tentative definition.)
The internal assembler in Clang itself produces:
8: 0000000000000004 4 OBJECT GLOBAL DEFAULT COMMON w
9: 0000000000000004 4 OBJECT GLOBAL DEFAULT COMMON x
10: 0000000000000004 4 OBJECT GLOBAL DEFAULT COMMON y
11: 0000000000000004 4 OBJECT GLOBAL DEFAULT COMMON z
On my system, BFD ld from binutils turns that into:
54: 000000000060102c 4 OBJECT GLOBAL DEFAULT 23 z
55: 0000000000601030 4 OBJECT GLOBAL DEFAULT 23 x
65: 0000000000601034 4 OBJECT GLOBAL DEFAULT 23 w
66: 0000000000601038 4 OBJECT GLOBAL DEFAULT 23 y
Curiously, gold from the same binutils (2.28) version produces:
25: 0000000000402014 4 OBJECT GLOBAL DEFAULT 24 w
26: 0000000000402020 4 OBJECT GLOBAL DEFAULT 24 z
27: 000000000040201c 4 OBJECT GLOBAL DEFAULT 24 y
28: 0000000000402018 4 OBJECT GLOBAL DEFAULT 24 x
My best guess is that in BFD ld's case, it just happens to be some hash table iteration order, and gold uses lexicographic symbol ordering.
Note that most of this happens because of tentative definitions and common symbols. The assembler and link editor are not permitted to reorder regular data object definitions within the same section, so if you disable the use of common symbols, you will get whatever the compiler produces in the assembler output. The object definition order is still not defined by the language standard, but you can check the compiler manual if it makes any additional guarantees.
Upvotes: 3