Reputation: 1307
I'm working on a large codebase that needs to be formatted. With many hiccups, it's been formatted with a clean start to finish run of clang-format, and I've moved on to comparing the object files between the original build and the formatted build, as an attempt to check clang-format's work.
Much to my surprise, the compiled binaries are not the same. The only thing that I have fond so far is a difference in immediate offsets in a bunch of mov/add/store instructions:
file format elf64-littleaarch64
430c430
< 628: 52803e63 mov w3, #0x1f3 // #499
---
> 628: 52803ac3 mov w3, #0x1d6 // #470
The change appears to be consistent across each file (every mov/add in this file changes by 29 base 10), but not across files, another file I've checked has an offset of 10 base 10 between new and old binary.
The only thing I can think of is some sort of string that identifies the file has changed, but I don't see how that would happen, and it's not coming out in the diff of the disassembly.
What could be the cause of this change in offset? The only difference between the two source files is that one had clang-format run on it, with a near default set of rules
461c466
< if (!strcmp(entry->d_name, ".") || !strcmp(entry->d_name, "..")) {
---
> if (!strcmp(entry->d_name, ".") || !strcmp(entry->d_name, ".."))
465c470
< 1068: 91220001 add x1, x0, #0x880
---
> 1068: 91224001 add x1, x0, #0x890
473c478
< 1088: 91222001 add x1, x0, #0x888
---
> 1088: 91226001 add x1, x0, #0x898
And the relevant assembly from one of the files
if (!strcmp(entry->d_name, ".") || !strcmp(entry->d_name, ".."))
105c: f94037e0 ldr x0, [sp, #104]
1060: 91004c02 add x2, x0, #0x13
1064: 90000000 adrp x0, 1000 <pciFindDevice+0x20>
1068: 91224001 add x1, x0, #0x890
106c: aa0203e0 mov x0, x2
1070: 97ffff20 bl cf0 <strcmp@plt>
1074: 7100001f cmp w0, #0x0
1078: 540006a0 b.eq 114c <pciFindDevice+0x16c> // b.none
107c: f94037e0 ldr x0, [sp, #104]
1080: 91004c02 add x2, x0, #0x13
1084: 90000000 adrp x0, 1000 <pciFindDevice+0x20>
1088: 91226001 add x1, x0, #0x898
108c: aa0203e0 mov x0, x2
Upvotes: 2
Views: 202
Reputation: 144969
If your code uses assert
macros, the expansion in DEBUG
mode does generate code that depends on line numbering because the macro __LINE__
gets expanded to a different value, which is passed to fprintf
to produce the diagnostic with the file name and line number.
You can compare the code generated with assertions disabled.
You should also grep for __LINE__
in the source code and include files to identify other potential uses of the line numbers in custom macros.
From the details in the question update, the differing data is the offset of the string constants for "."
and ".."
used as arguments for strcmp
. The final values for these offsets comes from the link loader after global object code optimisation by LLVM. Most or all string constants in this and other modules may be shifted in the executable, causing the offsets to change in many places in the executable code.
The code posted shows a difference only as a side effect of something else which may be in a different source file.
You can try and identify where this difference starts by loading both executable in a hex editor (eg: qemacs) and search for longer string constants to try and compare the binaries, scanning backwards to find the first difference explaining the shift. If this difference is a string constant that differs between the executables, you will have a good candidate to investigate further.
Upvotes: 2