Brydon Gibson
Brydon Gibson

Reputation: 1307

clang-format causing compiled object files to change

I'm working on a large codebase that needs to be formatted. With many hiccups, it's been formatted with a clean start to finish run of clang-format, and I've moved on to comparing the object files between the original build and the formatted build, as an attempt to check clang-format's work.

Much to my surprise, the compiled binaries are not the same. The only thing that I have fond so far is a difference in immediate offsets in a bunch of mov/add/store instructions:

file format elf64-littleaarch64
430c430
<      628:     52803e63        mov     w3, #0x1f3                      // #499
---
>      628:     52803ac3        mov     w3, #0x1d6                      // #470

The change appears to be consistent across each file (every mov/add in this file changes by 29 base 10), but not across files, another file I've checked has an offset of 10 base 10 between new and old binary.

The only thing I can think of is some sort of string that identifies the file has changed, but I don't see how that would happen, and it's not coming out in the diff of the disassembly.

What could be the cause of this change in offset? The only difference between the two source files is that one had clang-format run on it, with a near default set of rules

Edit: a more detailed example:

 461c466
<     if (!strcmp(entry->d_name, ".") || !strcmp(entry->d_name, "..")) {
---
>         if (!strcmp(entry->d_name, ".") || !strcmp(entry->d_name, ".."))
465c470
<     1068:     91220001        add     x1, x0, #0x880
---
>     1068:     91224001        add     x1, x0, #0x890
473c478
<     1088:     91222001        add     x1, x0, #0x888
---
>     1088:     91226001        add     x1, x0, #0x898

And the relevant assembly from one of the files

if (!strcmp(entry->d_name, ".") || !strcmp(entry->d_name, ".."))
105c:       f94037e0        ldr     x0, [sp, #104]
1060:       91004c02        add     x2, x0, #0x13
1064:       90000000        adrp    x0, 1000 <pciFindDevice+0x20>
1068:       91224001        add     x1, x0, #0x890
106c:       aa0203e0        mov     x0, x2
1070:       97ffff20        bl      cf0 <strcmp@plt>
1074:       7100001f        cmp     w0, #0x0
1078:       540006a0        b.eq    114c <pciFindDevice+0x16c>  // b.none
107c:       f94037e0        ldr     x0, [sp, #104]
1080:       91004c02        add     x2, x0, #0x13
1084:       90000000        adrp    x0, 1000 <pciFindDevice+0x20>
1088:       91226001        add     x1, x0, #0x898
108c:       aa0203e0        mov     x0, x2

Upvotes: 2

Views: 202

Answers (1)

chqrlie
chqrlie

Reputation: 144969

If your code uses assert macros, the expansion in DEBUG mode does generate code that depends on line numbering because the macro __LINE__ gets expanded to a different value, which is passed to fprintf to produce the diagnostic with the file name and line number.

You can compare the code generated with assertions disabled.

You should also grep for __LINE__ in the source code and include files to identify other potential uses of the line numbers in custom macros.

From the details in the question update, the differing data is the offset of the string constants for "." and ".." used as arguments for strcmp. The final values for these offsets comes from the link loader after global object code optimisation by LLVM. Most or all string constants in this and other modules may be shifted in the executable, causing the offsets to change in many places in the executable code.

The code posted shows a difference only as a side effect of something else which may be in a different source file.

You can try and identify where this difference starts by loading both executable in a hex editor (eg: qemacs) and search for longer string constants to try and compare the binaries, scanning backwards to find the first difference explaining the shift. If this difference is a string constant that differs between the executables, you will have a good candidate to investigate further.

Upvotes: 2

Related Questions