Reputation: 27
ISRs take a long time, so I looked at the asm to see what it was doing.
I compile this C with gcc -O3 -mmcu=attiny13a
and some other options.
#include <avr/interrupt.h>
ISR(TIM0_COMPA_vect)
{
}
avr-objdump.exe -d test.elf
output:
00000048 <__vector_6>:
48: 1f 92 push r1
4a: 0f 92 push r0
4c: 0f b6 in r0, 0x3f ; 63
4e: 0f 92 push r0
50: 11 24 eor r1, r1
52: 0f 90 pop r0
54: 0f be out 0x3f, r0 ; 63
56: 0f 90 pop r0
58: 1f 90 pop r1
5a: 18 95 reti
Is the assembler code right, although C code is empty?
These links explain some about ISR()
, but don't go into detail about which parts of the asm are required, or if it would be possible to get GCC to optimize away some of the instructions in simple ISRs that don't need them.
ISR()
macro__attribute__((interrupt))
.GCC's asm output (https://godbolt.org/z/zzbY5KE3c) uses pseudo-instructions like __gcc_isr 1
.
Newer GCC (9.2 on Godbolt) supports -mno-gas-isr-prologues
to get GCC to show the real instructions that match the disassembly from Atmel Studio above. So if anyone wants to play with this, something that has an effect in https://godbolt.org/z/q6M518qfP will probably have the same effect in real Atmel Studio.
Upvotes: 0
Views: 443
Reputation: 3918
Is the assembler code right, although C code is empty?
Yes. This is the code for avr-gcc up to and including v7. Newer versions of the compiler might generate more efficient code, see the GCC v8 Release Notes. The reason is this:
When the avr-gcc ABI was devised, the decision was to model R0 and R1 as fixed registers. "Fixed register" means that the compiler won't use them in register allocation or otherwise in any way. The only use of these regs was in the final stage of compilation when assembly code is printed to *.s
, where these registers could be used implicitly in the respective output strings. This is basically the same like instruction output via inline assembly, which is opaque to the compiler.
The reason behind this choice was that overall code quality could be improved by having these extra registers at hand, where R0 is used as a temporary register aka. __tmp_reg__
, and R1 aka. __zero_reg__
contains a value of zero. For example, to compare a 16-bit integer in register %0 against 42, you can just
cpi %A0, 42
cpc %B0, __zero_reg__
without any further ado, i.e. no need to allocate some temporary register, clear it etc.
The disadvantage of this approach is that there is no usage of life info for these registers, for example in multiplication code like
char mul (char x)
{
return x * x * x * x;
}
you have to reset R1 to 0 according to the ABI because MUL
destroys its content:
mul:
mul r24,r24
mov r24,r0
clr r1 ; Superfluous
mul r24,r24 ; Overrides r1
mov r24,r0
clr r1 ; Restore __zero_reg__ to 0
ret
The first clr r1
is superfluous because the following mul
will override it.
That ABI design lead also to these expensive ISR pro- and epilogues because no analysis is available on whether R0, R1 are used or changed, same for SREG. Therefore, a classic ISR prologue has to
no matter what the body of the ISR is, and the epilogue has to restore them.
Due to the complexity of the problem, it took 12 years from filing PR20296 to its resolution. The bulk of analysis was shifted from the compiler to the assembler by means of a pseudo-instruction __gcc_isr
. To see how it works, consider the following C code:
volatile char c;
__attribute__((__signal__))
void __vector_X (void)
{
++c;
}
and the assembly code from avr-gcc v8+ -Os -save-temps
:
__vector_X:
__gcc_isr 1
lds r24,c
subi r24,lo8(-1)
sts c,r24
__gcc_isr 2
reti
__gcc_isr 0,r24
What the compiler does:
Don't generate __gcc_isr
if Binutils don't support it (determined during configure whether gas accepts -mgcc-isr
), if optimization is off, if the ISR is attributed no_gccisr
, if -mgas-isr-prologues
has been switched off, etc.
Don't generate __gcc_isr
if the ISR has open-coded calls or does weird stuff like non-local goto (setjmp / longjmp).
If all goes well, print __gcc_isr
pseudo-instructions instead of actual ISR prologue / epilogues.
What the assembler does:
It analyzes the ocmplete ISR code starting at prologue chunk __gcc_isr 1
up to final chunk 0 and records usage of R0, R1 and effects on SREG.
Don't analyze code behind function calls: If [r]call
is encountered, assume the worst for R0, R1 and SREG. Tail-calls (calls via some jump instruction) have already been handled by the compiler.
Print optimized prologue for chunk 1 and epilogue(s) for chunk(s) 2 according to R0, R1, SREG usage. The register specified with chunk 0 may be used to push / pop SREG because the compiler uses this register anyways.
For the example from above, the final code will be:
<__vector_X>:
8f 93 push r24
8f b7 in r24, 0x3f ; SREG
8f 93 push r24
80 91 60 00 lds r24, 0x0060 ; <c>
8f 5f subi r24, 0xFF
80 93 60 00 sts 0x0060, r24 ; <c>
8f 91 pop r24
8f bf out 0x3f, r24 ; SREG
8f 91 pop r24
18 95 reti
The obvious advantage of letting the assembler doing the analysis is that it even works for code from inline assembly which is opaque to GCC.
The first thing to notice is that we get analysis of inline asm for free with the current approach. Handling of inline asm was not the reason behind the decision to let gas do the work, though, it's just a nice side effect. So what follows is basically a TL;DR why we use gas as a working horse.
It's correct that inline asm must make all side effects explicit, with a grain of salt:
Before the cc0→CCmode transition, there was no condition code register to clobber, so the assumption would be that basically each and ever insn would clobber cc0. The situation didn't change much with the introduction of CCmode (it actually got worse): Compare insns are setting CC, but almost every other insn besides branches or super-simple 1-instruction isns are clobbering CC.
The reason is that many insns have very complex insn output printers, for example for specific arithmetic or multi-byte load / stores. It's not possible with any reasonable amount of work to model exact CC behavior on that level, hence just assume CC clobber. This also applies to inline asm: Since advent of CCmode, the avr backend just adds "cc" clobbers to all inline assembly so that legacy code won't break, see avr.cc.
Similar situation for tmp_reg: Insn printers will use it implicitly, then and when, so the compiler cannot work out its usage / clobber status with any reasonable precision even if it was an ordinary, allocated register and not a fixed one.
Same for zero_reg, which is also fixed. Some insn printers will just use it in special cases, and it's not possible to model this in a reasonable way, either. As you already noticed, insns (and inline asm) may assume zero_reg = 0, which is the reason why ISR functions with a single
asm ("sts 0,__zero_reg__");
will work and instanciate zero_reg magically.
And of course, it's not possible to add implicit operands like "r" (0)
to inline asm — even if it was possible, this would break existing code. And clobbering R0 or R1 is still void because they are fixed, so we don't want to rely on clobbers being present.
And technically, an inline asm that's clobbering zero_reg and then restores it to 0, does not clobber it. However, ISRs would still need to know.
Upvotes: 1