Reputation: 4026
m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CFLAGS = -Wall -Werror -ffreestanding -nostdlib -O2 -m68000 -mshort
I am very confused why gcc generates such (seemingly) non-optimal code for a simple for loop over a const array.
const unsigned int pallet[16] = {
0x0000,
0x00E0,
0x000E,
...
0x0000
};
...
volatile unsigned long * const VDP_DATA = (unsigned long *) 0x00C00000;
...
for(int i = 0; i < 16; i++) {
*VDP_DATA = pallet[i];
}
Results in:
296: 41f9 0000 037e lea 37e <pallet+0x2>,%a0
29c: 223c 0000 039c movel #924,%d1
2a2: 4240 clrw %d0
2a4: 0280 0000 ffff andil #65535,%d0
2aa: 23c0 00c0 0000 movel %d0,c00000 <_etext+0xbffc2c>
2b0: b288 cmpl %a0,%d1
2b2: 6712 beqs 2c6 <main+0x46>
2b4: 3018 movew %a0@+,%d0
2b6: 0280 0000 ffff andil #65535,%d0
2bc: 23c0 00c0 0000 movel %d0,c00000 <_etext+0xbffc2c>
2c2: b288 cmpl %a0,%d1
2c4: 66ee bnes 2b4 <main+0x34>
My main concern:
Why the useless first element compare at 2b0
? This will never hit and never gets branched back to. It just ends up being duplicate code all for the first iteration.
O3
simply unrolls the loop, which I don't want either as space is a bigger concern than speed at this part of the code.lea pallet,%a0
movel #7,%d0
1:
movel %a0@+,c00000
dbra %d0,1
I get that I have to be a bit more explicit in my code to get it to write in long chunks. My main point here is how come gcc can't seem to figure out the my intentions i.e I just want to dump this data in to this address.
Another observation:
clrw %d0
→ andil #65535,%d0
→ movel %d0,c00000
. Why not just clrl
and move?
Upvotes: 3
Views: 349
Reputation: 812
I've been playing with GCC and 68k code generation and I've found that it merely can't generate decent code for 68k family any more, particularly not for 68000.
The code is barely correct, however not optimized (or should I say, it seems to be DE-optimized?). You should first try to use -Os instead of -O2. Even then you'll encounter lots of useless insns in the generated code.
My speculation is that while the actual architectures support in GCC quickly moves forward, backend for 68k is not properly maintained, being simply kept correct with minimal effort.
Upvotes: 1