Reputation: 13
I want to benchmark the number of cycles used by different machine instructions on my system (for this example it is ARM Cortex-M4). So i use a macro, which repeats the target instruction a number of times and before and after this, i read the cycle counter of my controller. In the asm-dump, i see that at some position the const data (address of my cycle-counter register) is filled in (position 8003140 to 8003150, marked with ">"):
08002d58 <testThis>:
8002d58: 48fa ldr r0, [pc, #1000] ; (8003144 <testThis+0x3ec>)
8002d5a: 49fb ldr r1, [pc, #1004] ; (8003148 <testThis+0x3f0>)
8002d5c: 4bfb ldr r3, [pc, #1004] ; (800314c <testThis+0x3f4>)
8002d5e: 4afc ldr r2, [pc, #1008] ; (8003150 <testThis+0x3f8>)
8002d60: 6800 ldr r0, [r0, #0]
8002d62: 6008 str r0, [r1, #0]
8002d64: 681b ldr r3, [r3, #0]
8002d66: 6812 ldr r2, [r2, #0]
8002d68: fa82 f183 qadd r1, r3, r2
8002d6c: fa82 f183 qadd r1, r3, r2
..
8003138: fa82 f183 qadd r1, r3, r2
800313c: fa82 f183 qadd r1, r3, r2
> 8003140: e008 b.n 8003154 <testThis+0x3fc>
> 8003142: bf00 nop
> 8003144: e0001004 .word 0xe0001004
> 8003148: 20000598 .word 0x20000598
> 800314c: 20000594 .word 0x20000594
> 8003150: 200002e4 .word 0x200002e4
8003154: fa82 f183 qadd r1, r3, r2
8003158: fa82 f183 qadd r1, r3, r2
..
8003b84: fa82 f183 qadd r1, r3, r2
8003b88: fa82 f383 qadd r3, r3, r2
8003b8c: 4803 ldr r0, [pc, #12] ; (8003b9c <testThis+0xe44>)
8003b8e: 4904 ldr r1, [pc, #16] ; (8003ba0 <testThis+0xe48>)
8003b90: 6003 str r3, [r0, #0]
8003b92: 4b04 ldr r3, [pc, #16] ; (8003ba4 <testThis+0xe4c>)
8003b94: 680a ldr r2, [r1, #0]
8003b96: 601a str r2, [r3, #0]
8003b98: 4770 bx lr
8003b9a: bf00 nop
8003b9c: 2000058c .word 0x2000058c
8003ba0: e0001004 .word 0xe0001004
8003ba4: 20000338 .word 0x20000338
Why is this not filled in at the beginning? Am i able to control this?
GCC version:
gcc version 4.8.3 20140228 (release) [ARM/embedded-4_8-branch revision 208322]
C-Code:
#define READCYCCNT() *((volatile unsigned int *)0xE0001004)
uint32_t cyc_begin, cyc_end;
int c, a, b;
void testThis(void *obj)
{
cyc_begin = READCYCCNT();
REP(9,0,0, c, __QADD, a, b);
cyc_end = READCYCCNT();
}
The REP-macro is a bit lengthy. It just adds 900 calls to
c = __QADD(a,b)
Compiler-call:
arm-atollic-eabi-gcc -c -mthumb -mcpu=cortex-m4 -mfloat-abi=softfp -mfpu=fpv4-sp-d16 -std=gnu90 -DDEBUG=1 -I../Inc -I../CoreSupport -I../DeviceSupport -Ofast -ffunction-sections -fdata-sections -g -Wall -o Application\Main.o ..\Application\Main.c
Upvotes: 1
Views: 145
Reputation: 3203
The compiler have generated your code using ldr instructions with addressing relative to PC. Those instructions have only 5 bits to store the relative address, so they can only access data within range 0-124 words from the current program counter position. This is why the compiler have placed your data in the middle of the code. Here's the quick reference card for thumb instructions.
There are several ways to avoid this. You could replace the macro by a hand-written assembly which uses different addressing mode. You could replace variables by constants and avoid addressing altogether. You could reduce the number of times your macro is called. You could remove the -mthumb flag to generate 32-bit instructions which have more bits for addressing. It really depends on what you want to evaluate with your test.
Upvotes: 1