assemblygccavrinterrupt-handlingatmelstudio

Reputation: 27

Atmel studio (GCC) uses a lot of instructions even in an empty ISR function? Can that be optimized?

ISRs take a long time, so I looked at the asm to see what it was doing.

I compile this C with gcc -O3 -mmcu=attiny13a and some other options.

#include <avr/interrupt.h>
ISR(TIM0_COMPA_vect)
{

}

avr-objdump.exe -d test.elf output:

00000048 <__vector_6>:
  48:   1f 92           push    r1
  4a:   0f 92           push    r0
  4c:   0f b6           in  r0, 0x3f    ; 63
  4e:   0f 92           push    r0
  50:   11 24           eor r1, r1
  52:   0f 90           pop r0
  54:   0f be           out 0x3f, r0    ; 63
  56:   0f 90           pop r0
  58:   1f 90           pop r1
  5a:   18 95           reti

Is the assembler code right, although C code is empty?

These links explain some about ISR(), but don't go into detail about which parts of the asm are required, or if it would be possible to get GCC to optimize away some of the instructions in simple ISRs that don't need them.

https://www.nongnu.org/avr-libc/user-manual/group__avr__interrupts.html ISR() macro
https://gcc.gnu.org/onlinedocs/gcc/AVR-Function-Attributes.html some details about __attribute__((interrupt)).

GCC's asm output (https://godbolt.org/z/zzbY5KE3c) uses pseudo-instructions like __gcc_isr 1.

Newer GCC (9.2 on Godbolt) supports -mno-gas-isr-prologues to get GCC to show the real instructions that match the disassembly from Atmel Studio above. So if anyone wants to play with this, something that has an effect in https://godbolt.org/z/q6M518qfP will probably have the same effect in real Atmel Studio.

Upvotes: 0

Answers (1)

emacs drives me nuts

Reputation: 3918

Is the assembler code right, although C code is empty?

Yes. This is the code for avr-gcc up to and including v7. Newer versions of the compiler might generate more efficient code, see the GCC v8 Release Notes. The reason is this:

avr-gcc ABI: R0 and R1 Modeling, Usage and Benefits

When the avr-gcc ABI was devised, the decision was to model R0 and R1 as fixed registers. "Fixed register" means that the compiler won't use them in register allocation or otherwise in any way. The only use of these regs was in the final stage of compilation when assembly code is printed to *.s, where these registers could be used implicitly in the respective output strings. This is basically the same like instruction output via inline assembly, which is opaque to the compiler.

The reason behind this choice was that overall code quality could be improved by having these extra registers at hand, where R0 is used as a temporary register aka. __tmp_reg__, and R1 aka. __zero_reg__ contains a value of zero. For example, to compare a 16-bit integer in register %0 against 42, you can just

cpi %A0, 42
cpc %B0, __zero_reg__

without any further ado, i.e. no need to allocate some temporary register, clear it etc.

Disadvantage of R0 and R1 being Fixed Registers

The disadvantage of this approach is that there is no usage of life info for these registers, for example in multiplication code like

char mul (char x)
{
    return x * x * x * x;
}

you have to reset R1 to 0 according to the ABI because MUL destroys its content:

mul:
    mul r24,r24
    mov r24,r0
    clr r1      ; Superfluous
    mul r24,r24 ; Overrides r1
    mov r24,r0
    clr r1      ; Restore __zero_reg__ to 0
    ret

The first clr r1 is superfluous because the following mul will override it.

That ABI design lead also to these expensive ISR pro- and epilogues because no analysis is available on whether R0, R1 are used or changed, same for SREG. Therefore, a classic ISR prologue has to

Save R0, R1 and SREG.
Set R1 to 0 (because it might temporarily hold a non-0 value like during the mul-sequence from above, but ISR code expects R1=0).

no matter what the body of the ISR is, and the epilogue has to restore them.

avr-gcc v8+ Solution: Pseudo-instruction __gcc_isr in ISRs

Due to the complexity of the problem, it took 12 years from filing PR20296 to its resolution. The bulk of analysis was shifted from the compiler to the assembler by means of a pseudo-instruction __gcc_isr. To see how it works, consider the following C code:

volatile char c;

__attribute__((__signal__))
void __vector_X (void)
{
    ++c;
}

and the assembly code from avr-gcc v8+ -Os -save-temps:

__vector_X:
    __gcc_isr 1
    lds  r24,c
    subi r24,lo8(-1)
    sts  c,r24
    __gcc_isr 2
    reti
    __gcc_isr 0,r24

What the compiler does:

Don't generate __gcc_isr if Binutils don't support it (determined during configure whether gas accepts -mgcc-isr), if optimization is off, if the ISR is attributed no_gccisr, if -mgas-isr-prologues has been switched off, etc.
Don't generate __gcc_isr if the ISR has open-coded calls or does weird stuff like non-local goto (setjmp / longjmp).
If all goes well, print __gcc_isr pseudo-instructions instead of actual ISR prologue / epilogues.

What the assembler does:

It analyzes the ocmplete ISR code starting at prologue chunk __gcc_isr 1 up to final chunk 0 and records usage of R0, R1 and effects on SREG.
Don't analyze code behind function calls: If [r]call is encountered, assume the worst for R0, R1 and SREG. Tail-calls (calls via some jump instruction) have already been handled by the compiler.
Print optimized prologue for chunk 1 and epilogue(s) for chunk(s) 2 according to R0, R1, SREG usage. The register specified with chunk 0 may be used to push / pop SREG because the compiler uses this register anyways.

For the example from above, the final code will be:

<__vector_X>:
    8f 93           push r24
    8f b7           in   r24, 0x3f  ; SREG
    8f 93           push r24
    80 91 60 00     lds  r24, 0x0060    ; <c>
    8f 5f           subi r24, 0xFF
    80 93 60 00     sts  0x0060, r24    ;  <c>
    8f 91           pop r24
    8f bf           out 0x3f, r24   ; SREG
    8f 91           pop r24
    18 95           reti

The obvious advantage of letting the assembler doing the analysis is that it even works for code from inline assembly which is opaque to GCC.

"Does inline asm matter?": A Note on Inline Assembly

The first thing to notice is that we get analysis of inline asm for free with the current approach. Handling of inline asm was not the reason behind the decision to let gas do the work, though, it's just a nice side effect. So what follows is basically a TL;DR why we use gas as a working horse.

It's correct that inline asm must make all side effects explicit, with a grain of salt:

Before the cc0→CCmode transition, there was no condition code register to clobber, so the assumption would be that basically each and ever insn would clobber cc0. The situation didn't change much with the introduction of CCmode (it actually got worse): Compare insns are setting CC, but almost every other insn besides branches or super-simple 1-instruction isns are clobbering CC.

The reason is that many insns have very complex insn output printers, for example for specific arithmetic or multi-byte load / stores. It's not possible with any reasonable amount of work to model exact CC behavior on that level, hence just assume CC clobber. This also applies to inline asm: Since advent of CCmode, the avr backend just adds "cc" clobbers to all inline assembly so that legacy code won't break, see avr.cc.

Similar situation for tmp_reg: Insn printers will use it implicitly, then and when, so the compiler cannot work out its usage / clobber status with any reasonable precision even if it was an ordinary, allocated register and not a fixed one.

Same for zero_reg, which is also fixed. Some insn printers will just use it in special cases, and it's not possible to model this in a reasonable way, either. As you already noticed, insns (and inline asm) may assume zero_reg = 0, which is the reason why ISR functions with a single

asm ("sts 0,__zero_reg__");

will work and instanciate zero_reg magically.

And of course, it's not possible to add implicit operands like "r" (0) to inline asm — even if it was possible, this would break existing code. And clobbering R0 or R1 is still void because they are fixed, so we don't want to rely on clobbers being present. And technically, an inline asm that's clobbering zero_reg and then restores it to 0, does not clobber it. However, ISRs would still need to know.

Upvotes: 1