Reputation: 67
I'm learning arctangent implementation in TMS320C55x this is the source code:
;* AR0 assigned to _x
;* AR1 assigned to _r
;* T0 assigned to _nx
PSH T3
|| BSET FRCT ;fractional mode
SUB #1, T0 ;nx-1
MOV T0, BRC0 ;repeat nx times
MOV #2596 << #16, AC3 ; AC3.Hi = C5
MOV #-9464 << #16, AC1 ; AC1.Hi = C3
MOV #32617 << #16, AC2 ; AC2.Hi = C1
*
* Note: loading T3 on the instruction before a multiply that uses it will
* cause a 1-cycle delay.
*
MPYMR T3=*AR0+, AC3, AC0 ; (Prime the Pump)
|| RPTBLOCAL loop1-1
MACR AC0, T3, AC1, AC0
MPYR T3, AC0
||MOV *AR0+, T1 ; (for next iteration)
MACR AC0, T3, AC2, AC0
MPYR T3, AC0
||MOV T1, T3
MOV HI(AC0), *AR1+ ;save result
||MPYR T1, AC3, AC0 ; (for next iteration)
loop1:
POP T3
|| BCLR FRCT ;return to standard C
MOV #0, T0 ;return OK value (no possible error)
|| RET
where _x is vector of input and _r is output. nx is the number of elements. The question is about constants that assigns to AC3, AC1, AC2. I guess it is coefficients for polynomial approximation but i don't understand how to calculate them
Upvotes: 3
Views: 102
Reputation: 17668
I do not follow the assembly code, but I could guess where those magic coefficients come from.
The code comments suggest C1
, C3
, C5
are coefficients of a polynomial approximation, and arctan
is an odd function, so its Taylor expansion around 0
has indeed only odd powers of x
. Comparing C1 = 32617
to 1
in the Taylor expansion y = x - 1/3 x^3 + 1/5 x^5 - 1/7 x^7 + ...
, and given the computational context, this further suggests that the result of the calculation is scaled by 2^15 = 32768
.
It turns out that y = (32617 x - 9464 x^3 + 2596 x^5) / 32768
is in fact a pretty good approximation of arctan(x)
over the interval [-1, 1]
. As shown below (verified in wolfram alpha) the largest absolute error of the approximation is less than 1/1000
, and is negligible at the endpoints x = ±1
corresponding to y = ±π/4
, which is probably desirable in graphics calculations.
As to how the coefficients were actually derived, a crude polynomial best-fit using just 9 control points gives a polynomial y = 32613 x - 9443 x^3 + 2573 x^5
with coefficients already close to the ones used in the posted code. More control points and/or additional conditions to minimize the error at the end points would result in slightly different coefficients, but it's hard to guess how to exactly match the ones in the code without any documentation or clues about the optimization criteria being actually used there.
Upvotes: 5