d3Roux
d3Roux

Reputation: 316

How can I generate following arm assembler output using ARM gcc 7.3?

myfunction:
@ Function supports interworking.
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
mul r3, r0, r0
mov r0, r3
mla r0, r1, r0, r2
bx  lr

I am able to generate everything except for the mov instruction using following C function.

int myfunction(int r0, int r1, int r2, int r3)
{
  r3 = r0*r0;
  r0 = r3;
  r3 = r0;
  return (r1*r3)+r2;
}

How can I instruct r3 to be set to the address of r0 in assembly code?

Upvotes: 1

Views: 1340

Answers (1)

old_timer
old_timer

Reputation: 71566

unsigned int myfunction(unsigned int a, unsigned int  b, unsigned int c)
{
  return (a*a*b)+c;
}

Your choices are going to be something like this

00000000 <myfunction>:
   0:   e52db004    push    {r11}       ; (str r11, [sp, #-4]!)
   4:   e28db000    add r11, sp, #0
   8:   e24dd014    sub sp, sp, #20
   c:   e50b0008    str r0, [r11, #-8]
  10:   e50b100c    str r1, [r11, #-12]
  14:   e50b2010    str r2, [r11, #-16]
  18:   e51b3008    ldr r3, [r11, #-8]
  1c:   e51b2008    ldr r2, [r11, #-8]
  20:   e0010392    mul r1, r2, r3
  24:   e51b200c    ldr r2, [r11, #-12]
  28:   e0000291    mul r0, r1, r2
  2c:   e51b3010    ldr r3, [r11, #-16]
  30:   e0803003    add r3, r0, r3
  34:   e1a00003    mov r0, r3
  38:   e28bd000    add sp, r11, #0
  3c:   e49db004    pop {r11}       ; (ldr r11, [sp], #4)
  40:   e12fff1e    bx  lr

or this

00000000 <myfunction>:
   0:   e0030090    mul r3, r0, r0
   4:   e0202391    mla r0, r1, r3, r2
   8:   e12fff1e    bx  lr

as you have probably figured out.

The mov should never be considered by the compiler backend as it just wastes an instruction. r3 goes into the mla no need to put it in r0 then do the mla. Not quite sure how to get the compiler to do more. Even this doesn't encourage it

unsigned int fun ( unsigned int a )
{
    return(a*a);
}
unsigned int myfunction(unsigned int a, unsigned int  b, unsigned int c)
{
  return (fun(a)*b)+c;
}

giving

00000000 <fun>:
   0:   e1a03000    mov r3, r0
   4:   e0000093    mul r0, r3, r0
   8:   e12fff1e    bx  lr

0000000c <myfunction>:
   c:   e0030090    mul r3, r0, r0
  10:   e0202391    mla r0, r1, r3, r2
  14:   e12fff1e    bx  lr

Basically if you don't optimize you get nowhere near what you were after. If you optimize that mov shouldn't be there, should be easy to optimize out.

While some level of manipulation of writing high level code to encourage the compiler to output low level code is possible, trying to get this exact output is not something you should expect to be able to do.

Unless you use inline asm

asm
(
   "mul r3, r0, r0\n"
   "mov r0, r3\n"
   "mla r0, r1, r0, r2\n"
   "bx lr\n"
);

giving your result

Disassembly of section .text:

00000000 <.text>:
   0:   e0030090    mul r3, r0, r0
   4:   e1a00003    mov r0, r3
   8:   e0202091    mla r0, r1, r0, r2
   c:   e12fff1e    bx  lr

or real asm

mul r3, r0, r0
mov r0, r3
mla r0, r1, r0, r2
bx lr

and feed it into gcc rather than as (arm-whatever-gcc so.s -o so.o)

Disassembly of section .text:

00000000 <.text>:
   0:   e0030090    mul r3, r0, r0
   4:   e1a00003    mov r0, r3
   8:   e0202091    mla r0, r1, r0, r2
   c:   e12fff1e    bx  lr

so that technically you were using gcc on the command line but gcc does some preprocessing and then feeds it to as.

Unless you find a core or where Rd and Rs have to be the same register and can then specify that core/bug/whatever on the gcc command line, I don't see the mov happening, maybe, just maybe, with clang/llvm compile fun and myfunction separately to bytecode then combine them then optimize then output to the target then examine that. I would hope either in the optimization or the output that the mov would be optimized out but you might get lucky.

Edit

I made an error:

unsigned int myfunction(unsigned int a, unsigned int  b, unsigned int c)
{
  return (a*a*b)+c;
}

arm-linux-gnueabi-gcc --version
arm-linux-gnueabi-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Disassembly of section .text:

00000000 <myfunction>:
   0:   e0030090    mul r3, r0, r0
   4:   e1a00003    mov r0, r3
   8:   e0202091    mla r0, r1, r0, r2
   c:   e12fff1e    bx  lr

but this

arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-objdump -D so.o

so.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <myfunction>:
   0:   e0030090    mul r3, r0, r0
   4:   e0202391    mla r0, r1, r3, r2
   8:   e12fff1e    bx  lr

I'll have to build a 7.3 or go find one. Somewhere between 5.x.x and 8.x.x the backend changed or...

Note you may need -mcpu=arm7tdmi or -mcpu=arm9tdmi or -march=armv4t or -march=armv5t on the command line depending on the default target (cpu/arch) built into your compiler. Or you might get something like this

Disassembly of section .text:

00000000 <myfunction>:
   0:   fb00 f000   mul.w   r0, r0, r0
   4:   fb01 2000   mla r0, r1, r0, r2
   8:   4770        bx  lr
   a:   bf00        nop

this

arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

produces

Disassembly of section .text:

00000000 <myfunction>:
   0:   e0030090    mul r3, r0, r0
   4:   e0202391    mla r0, r1, r3, r2
   8:   e12fff1e    bx  lr

So you may have to work backward to find the version where it changed, the source code change to gcc that caused it and modify 7.3.0 making something that is not really 7.3.0 but reports as 7.3.0 and outputs your desired code.

Upvotes: 3

Related Questions