Cedar
Cedar

Reputation: 887

Align C function to "odd" address

I know from C Function alignment in GCC that i can align functions using

    __attribute__((optimize("align-functions=32")))

Now, what if I want a function to start at an "odd" address, as in, I want it to start at an address of the form 32(2k+1), where k is any integer?

I would like the function to start at address (decimal) 32 or 96 or 160, but not 0 or 64 or 128.

Context: I'm doing a research project on code caches, and I want a function aligned in one level of cache but misaligned in another.

Upvotes: 2

Views: 1437

Answers (3)

ecm
ecm

Reputation: 2763

As this question is tagged assembly, here are two spots in my (NASM 8086) sources that "anti align" following instructions and data. (Here just with an alignment to even addresses, ie 2-byte alignment.) Both were based on the calculation done by NASM's align macro.

https://hg.ulukai.org/ecm/ldebug/file/683a1d8ccef9/source/debug.asm#l1161

        times 1 - (($ - $$) & 1) nop    ; align in-code parameter
        call entry_to_code_sel, exc_code

https://hg.ulukai.org/ecm/ldebug/file/683a1d8ccef9/source/debug.asm#l7062

                ; $ - $$        = offset into section
                ; % 2           = 1 if odd offset, 0 if even
                ; 2 -           = 1 if odd, 2 if even
                ; % 2           = 1 if odd, 0 if even
        ; resb (2 - (($-$$) % 2)) % 2
                ; $ - $$        = offset into section
                ; % 2           = 1 if odd offset, 0 if even
                ; 1 -           = 0 if odd, 1 if even
        resb 1 - (($-$$) % 2)           ; make line_out aligned
trim_overflow:  resb 1                  ; actually part of line_out to avoid overflow of trimputs loop
line_out:       resb 263
                resb 1                  ; reserved for terminating zero
line_out_end:

Here is a simpler way to achieve anti-alignment:

                align 2
                nop

This is more wasteful though, it may use up 2 bytes if the target anti-alignment already would be satisfied before this sequence. My prior examples will not reserve any more space than necessary.

Upvotes: 2

Peter Cordes
Peter Cordes

Reputation: 366096

GCC doesn't have options to do that.

Instead, compile to asm and do some text manipulation on that output. e.g. gcc -O3 -S foo.c then run some script on foo.s to odd-align before some function labels, before compiling to a final executable with gcc -o benchmark foo.s.

One simple way (that costs between 32 and 95 bytes of padding) is this simplistic way:

 .balign 64        # byte-align by 64
 .space 32         # emit 32 bytes (of zeros)
starts_half_way_into_a_cache_line:
testfunc1:

Tweaking GCC/clang output after compilation is in general a good way to explore what gcc should have done. All references to other code/data inside and outside the function uses symbol names, nothing depends on relative distances between functions or absolute addresses until after you assemble (and link), so editing the asm source at this point is totally safe. (Another answer proposes copying final machine code around; that's very fragile, see the comments under it.)

An automated text-manipulation script will let you run your experiment on larger amounts of code. It can be as simple as
awk '/^testfunc.*:/ { print ".p2align 6; .skip 32"; print $0 }' foo.s
to do this before every label that matches the pattern ^testfunc.*. (Assuming no leading underscore name mangling.)

Or even use sed which has a convenient -i option to do it "in-place" by renaming the output file over the original, or perl has something similar. Fortunately, compiler output is pretty formulaic, for a given compiler it should be a pretty easy pattern-matching problem.


Keep in mind that the effects of code-alignment aren't always purely local. Branches in one function can alias (in the branch-predictor) with branches from another function depending on alignment details.

It can be hard to know exactly why a change affects performance, especially if you're talking about early in a function where it shifts branch addresses in the rest of the function by a couple bytes. You're not talking about changes like that, though, just shifting the whole function around. But it will change alignment relative to other functions, so tests that call multiple functions alternating with each other, or if the functions call each other, can be affected.

Other effects of alignment include uop-cache packing on modern x86, as well as fetch block. (Beyond the obvious effect of leaving unused space in an I-cache line).


Ideally you'd only insert 0..63 bytes to reach a desired position relative to a 64-byte boundary. This section is a failed attempt at getting that to work.

.p2align and .balign1 support an optional 3rd arg which specifies a maximum amount of padding, so we're close to being about to do it with GAS directives. We can maybe build on that to detect whether we're close to an odd or even boundary by checking whether it inserted any padding or not. (Assuming we're only talking about 2 cases, not the 4 cases of 16-byte relative to 64-byte for example.)

# DOESN'T WORK, and maybe not fixable
1:  # local label
 .balign 64,,31     # pad with up to 31 bytes to reach 64-byte alignment
2:
 .balign  32        # byte-align by 32, maybe to the position we want, maybe not
.ifne 2b - 1b
  # there is space between labels 2 and 1 so that balign reached a 64-byte boundary
  .space  32
.endif       # else it was already an odd boundary

But unfortunately this doesn't work: Error: non-constant expression in ".if" statement. If the code between the 1: and 2: labels has fixed size, like .long 0xdeadbeef, it will assemble just fine. So apparently GAS won't let you query with a .if how much padding an alignment directive inserted.

Footnote 1: .align is either .p2align (power of 2) or .balign (byte) depending on which target you're assembling for. Instead of remembering which is which on which target, I'd recommend always using .p2align or .balign, not .align.

Upvotes: 4

user545199
user545199

Reputation:

I believe GCC only lets you align on powers of 2

If you want to get around this for testing, you could compile your functions using position independent code (-FPIC or -FPIE) and then write a separate loader that manually copies the function into an area that was MMAP'd as read/write. And then you can change the permissions to make it executable. Of course for a proper performance comparison, you would want to make sure the aligned code that you are comparing it against was also compiled with FPIC/FPIE.

I can probably give you some example code if you need it, just let me know.

Upvotes: 0

Related Questions