Reputation: 269
I am using STM32F4 and trying to writ an ASM function from called from within C. The function is to be called inside a C function and which is also in an interrupt. I am pushing and popping r4-r7. Do I need to do anything else? My assumption is r0-r3 don't need pushing. I am also modifying global variables withing the ASM function. My guess is these should be declared volatile. Any tips would be welcome. Also I have noticed that the Cortex M4 instruction set outlined by ARM is not the same as the instructions that seem to be available to GCC compiler. For instance there is no write back ie r0,[r1],#4 for post increment is illegal. Is there a list of which ASM instructions are permissible? I am assuming STM32F4 uses thumb2
So far it doesn't seem to be working and am wondering what the possible issues could be Apart from errors in the assembly
Upvotes: 5
Views: 4204
Reputation:
Some answers to your questions are in the "Procedure Call Standard for the ARM Architecture" book. Here is the link.
The book says that the first four registers r0-r3 (and s0-15 for FPU) are used to pass argument values into a subroutine and to return a result value from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls). The registers r4-r8, r10 and r11 (and s16-s31 for FPU) are used to hold the values of a routine’s local variables. A subroutine must preserve the contents of this registers.
Now about volatile
. I think yes, you must use it to prevent compiler optimizations that can 'brake' you programm logic.
And about your sine. English is not my natural language, so I do not quite understand what you need, but if you need some fast and precise sine aproximation as part of your problem, you might be interested in this articles: http://devmaster.net/forums/topic/4648-fast-and-accurate-sinecosine/ http://www.coranac.com/2009/07/sines/.
And the last. I'm almost finished my sine approximation function for Cortex-M4. It uses FPU, takes about 30 cycles and brings zerro error result in single floating point range.
Upvotes: 1
Reputation: 269
Couldn't answer my own question till I waited 8 hours? anyway here is what I got and it works!! There is quite a bit going on in this function. It is basically a sine wave oscillator that uses a LUT for sine values. It also use a table of exponential values which are assigned using ADC hooked up to a pot for control. There is a 32bit phase accumulator which creates a ramp that is then scaled for lookup. The sine table (which I didn't include much of) is 16bit values truncated to 14bit table size. I am sure there are lots of optimizations to be done in this code but at least it gets me started. I am generating 16 sine wave samples @48k with each pass of this function and filling a buffer which (outside this function) is transferred to DMA and output through Discovery on-board codec. It is very smooth sounding I must say. Total instruction cycles seems to be around 1200 so far. I have up to 56000 cycles if I need them so this is pretty good. One thing I am having difficulty with is scaling the sine output. Sine table is int16_t values and I want to be able to multiply it by a fraction to get volume control. So far nothing I've tried works using smul, mul etc.
@ void get_sine(void)
.align 2 @ Align to word boundary
.global get_sine @ This makes it a real symbol
.thumb_func
.type get_sine STT_FUNC @ Declare get_sine to be a function.
get_sine: @ Start of function definition
push {r4-r7}
ldr r0,=pitch @ get pitch address
ldr r1,=expoLUT @ expo_tab address
ldr r7,[r0,#0] @ pitch val into r7
lsl r7,r7,#2
ldr r7,[r1,r7] @ move lookup expo tab value with r7 into r7
ldr r2,=sineLUT @ sine_tab base addy
ldr r4,=WaveBuffer @ storage array addy
ldr r5,=writePos @ get writepos addr
mov r6,#0 @ clear increment r6
outloop:
ldr r3,=phase @ phase address to r3
ldr r1,[r3,#0] @ get current phase
add r1,r1,r7 @ add current phase and ph_inc
str r1,[r3,#0] @ store phase
lsr r0,r1,#18 @ shift it right by 18 into r0 for sine_tab lookup
lsl r0,r0,#2 @ align it
ldr r0,[r2,r0] @ lookup sine val with r0 into r1
lsl r1,r0,#16 @ shift to left channel
add r0,r0,r1 @ add right channel
ldr r1,[r5,#0] @ get writePos
push {r1} @ push it before align
lsl r1,r1,#2 @ align address 4
str r0,[r4,r1] @ store sine to WaveBuffer
pop {r1} @ pop writepos back
add r1,r1,#1 @ increment array pointer writepos
ldr r3,=1024 @ load BUFFERSIZE compare
cmp r1,r3 @ skip if less than BUFFERSIZE
bne skip
mov r1,#0 @ clr writepos if >=BUFFERSIZE
skip:
str r1,[r5,#0] @ store writepos value
add r6,r6,#1 @ increment loop counter
ldr r0,=dataSize @ get datasize counter addr
ldr r1,[r0,#0] @ get val
add r1,r1,#1 @ increment datasize counter
str r1,[r0,#0] @ store counter
cmp r6,#16 @ compare with 16 (i=0;i<16;i++)
bne outloop
pop {r4-r7}
bx lr
.section .rodata
sineLUT:
@ Array goes in here. Type can be .byte, .hword or .word
@ NOTE! No comma at the end of a line! This is important
.word 0x0000,0x000c,0x0018,0x0024,0x0030,0x003c,0x0048,0x0054
.word 0x0064,0x0070,0x007c,0x0088,0x0094,0x00a0,0x00ac,0x00bc
.word 0x00c8,0x00d4,0x00e0,0x00ec,0x00f8,0x0104,0x0114,0x0120
.word 0x012c,0x0138,0x0144,0x0150,0x015c,0x016c,0x0178,0x0184
.word 0x0190,0x019c,0x01a8,0x01b4,0x01c4,0x01d0,0x01dc,0x01e8
.word 0x01f4,0x0200,0x020c,0x021c,0x0228,0x0234,0x0240,0x024c
.word
Upvotes: 2