Reputation: 269

ASM function from C ARM embedded

I am using STM32F4 and trying to writ an ASM function from called from within C. The function is to be called inside a C function and which is also in an interrupt. I am pushing and popping r4-r7. Do I need to do anything else? My assumption is r0-r3 don't need pushing. I am also modifying global variables withing the ASM function. My guess is these should be declared volatile. Any tips would be welcome. Also I have noticed that the Cortex M4 instruction set outlined by ARM is not the same as the instructions that seem to be available to GCC compiler. For instance there is no write back ie r0,[r1],#4 for post increment is illegal. Is there a list of which ASM instructions are permissible? I am assuming STM32F4 uses thumb2

So far it doesn't seem to be working and am wondering what the possible issues could be Apart from errors in the assembly

Upvotes: 5

Answers (2)

user1312703

Reputation:

Some answers to your questions are in the "Procedure Call Standard for the ARM Architecture" book. Here is the link.

The book says that the first four registers r0-r3 (and s0-15 for FPU) are used to pass argument values into a subroutine and to return a result value from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls). The registers r4-r8, r10 and r11 (and s16-s31 for FPU) are used to hold the values of a routine’s local variables. A subroutine must preserve the contents of this registers.

Now about volatile. I think yes, you must use it to prevent compiler optimizations that can 'brake' you programm logic.

And about your sine. English is not my natural language, so I do not quite understand what you need, but if you need some fast and precise sine aproximation as part of your problem, you might be interested in this articles: http://devmaster.net/forums/topic/4648-fast-and-accurate-sinecosine/ http://www.coranac.com/2009/07/sines/.

And the last. I'm almost finished my sine approximation function for Cortex-M4. It uses FPU, takes about 30 cycles and brings zerro error result in single floating point range.

Upvotes: 1

Bruce Duncan

Reputation: 269

Couldn't answer my own question till I waited 8 hours? anyway here is what I got and it works!! There is quite a bit going on in this function. It is basically a sine wave oscillator that uses a LUT for sine values. It also use a table of exponential values which are assigned using ADC hooked up to a pot for control. There is a 32bit phase accumulator which creates a ramp that is then scaled for lookup. The sine table (which I didn't include much of) is 16bit values truncated to 14bit table size. I am sure there are lots of optimizations to be done in this code but at least it gets me started. I am generating 16 sine wave samples @48k with each pass of this function and filling a buffer which (outside this function) is transferred to DMA and output through Discovery on-board codec. It is very smooth sounding I must say. Total instruction cycles seems to be around 1200 so far. I have up to 56000 cycles if I need them so this is pretty good. One thing I am having difficulty with is scaling the sine output. Sine table is int16_t values and I want to be able to multiply it by a fraction to get volume control. So far nothing I've tried works using smul, mul etc.

    @ void get_sine(void)
        .align 2                    @ Align to word boundary
        .global get_sine       @ This makes it a real symbol
        .thumb_func
        .type get_sine STT_FUNC    @ Declare get_sine to be a function.

    get_sine:                  @ Start of function definition
        push    {r4-r7}
        ldr     r0,=pitch       @   get pitch address
        ldr     r1,=expoLUT     @   expo_tab address
        ldr     r7,[r0,#0]      @   pitch val into r7
        lsl     r7,r7,#2
        ldr     r7,[r1,r7]      @   move lookup expo tab value with r7 into r7

        ldr     r2,=sineLUT     @   sine_tab base addy
        ldr     r4,=WaveBuffer  @   storage array addy
        ldr     r5,=writePos    @   get writepos addr
        mov     r6,#0           @   clear increment r6

    outloop:
        ldr     r3,=phase       @   phase address to r3
        ldr     r1,[r3,#0]      @   get current phase
        add     r1,r1,r7        @   add current phase and ph_inc
        str     r1,[r3,#0]      @   store phase
        lsr     r0,r1,#18       @   shift it right by 18 into r0 for sine_tab lookup
        lsl     r0,r0,#2        @   align it
        ldr     r0,[r2,r0]      @   lookup sine val with r0 into r1
        lsl     r1,r0,#16       @   shift to left channel
        add     r0,r0,r1        @   add right channel
        ldr     r1,[r5,#0]      @   get writePos
        push    {r1}            @   push it before align
        lsl     r1,r1,#2        @   align address 4
        str     r0,[r4,r1]      @   store sine to WaveBuffer
        pop     {r1}            @   pop writepos back
        add     r1,r1,#1        @   increment array pointer writepos
        ldr     r3,=1024        @   load BUFFERSIZE compare
        cmp     r1,r3           @   skip if less than BUFFERSIZE
        bne     skip
        mov     r1,#0           @   clr writepos if >=BUFFERSIZE

    skip:
        str     r1,[r5,#0]      @   store writepos value
        add     r6,r6,#1        @   increment loop counter
        ldr     r0,=dataSize    @   get datasize counter addr
        ldr     r1,[r0,#0]      @   get val
        add     r1,r1,#1        @   increment datasize counter
        str     r1,[r0,#0]      @   store counter
        cmp     r6,#16          @   compare with 16 (i=0;i<16;i++)
        bne     outloop
        pop     {r4-r7}
        bx      lr



    .section .rodata
        sineLUT:
        @ Array goes in here. Type can be .byte, .hword or .word
        @ NOTE! No comma at the end of a line! This is important

    .word   0x0000,0x000c,0x0018,0x0024,0x0030,0x003c,0x0048,0x0054
    .word   0x0064,0x0070,0x007c,0x0088,0x0094,0x00a0,0x00ac,0x00bc
    .word   0x00c8,0x00d4,0x00e0,0x00ec,0x00f8,0x0104,0x0114,0x0120
    .word   0x012c,0x0138,0x0144,0x0150,0x015c,0x016c,0x0178,0x0184
    .word   0x0190,0x019c,0x01a8,0x01b4,0x01c4,0x01d0,0x01dc,0x01e8
    .word   0x01f4,0x0200,0x020c,0x021c,0x0228,0x0234,0x0240,0x024c
    .word

Upvotes: 2

ASM function from C ARM embedded

Answers (2)

Related Questions