Reputation: 411
I'm creating a program in Fortran that takes in an x for sin(x) in radians, then the number of terms to compute.
This is my program:
! Sine value using MacLaurin series
program SineApprox
implicit none
integer :: numTerms, denom, i
double precision :: x, temp, summ
! Read Angle in Radians, and numTerms
read(*,*) x, numTerms
! Computing MacLaurin series
denom = (2*numTerms)-1 !Gives denominator
temp = x !Temp calculates each term
summ = x
do i = 3, denom, 2
temp = (((-temp)*(x**2))/(i*(i-1)))
summ = summ + temp
end do
print *, summ
end program SineApprox
However, I'm not getting the same value that my prof requires for the input: 5 30
The output of my code is:
-0.95892427466314001
But, the required output is:
-0.95892427466313568
^^^^
I can't figure out where the error is.
Upvotes: 3
Views: 532
Reputation: 26481
In contrast to what was demonstrated in both Chux's and
aka.nice their answers and mentioned in various comments, I
believe we jumped conclusion by focusing only on a single computation,
i.e. sin(5)
. Eventhough the OP's answer is closer to the true value,
it must be said that both the OP's algorithm and the factorial
algorithm are as bad for computing sin(5)
, but in the end, the
factorial algorithm is the better one.
This answer will not go into to much detail about Floating-Point Arithmetic. An excelent monograph can be found on this topic in What Every Computer Scientist Should Know About Floating-Point Arithmetic. Neither will I open the can of worms of Intermediate Floating-Point Precision.
disclaimer: I don't want to claim that what I write here is 100% correct and I am for sure not an authority in this field. I just found the question extremely interesting and tried to get as much out of it as I could. I obviously welcome any comment!
some interesting reads that lead to this:
floating point representation (FPR): in a finite floating point representation, a number is written as ±0.d1d2 ... dp x be. It has a precision p and an even base b where all digits di are integers with 0 ≤ di < b. It is straightforward that such representation cannot represent all real numbers. For example, if b = 10 and p = 3, the number 1/3 is approximated as 0.333 x 100 or if b=2 and p=24 one has to approximate 0.1 as 0.110011001100110011001101 x 25.
absolute error (AE): if a real number x has an AE of Δx, then we know that the expected true value x̃ of x lays between x - Δx ≤ x̃ ≤ x + Δx. If x is the closest FPR to the true value x̃, then we know that its AE is ( (b/2) b-p - 1 ) x be. For example, if b = 10 and p = 3, the number 1/3 is approximated as 0.333 x 100 and has an absolute error of 0.0005 x 100 indicating that 0.3325 x 100 ≤ 1/3 ≤ 0.3335 x 100
For a standard IEEE binary64 (double precision) computation
(b = 2 and p = 53) of the Taylor-MacLaurin series of sin(5)
one
quickly sees that the horse already bolted at the 2nd and 3rd
term. Here, the FPR is only accurate upto 2^-49
as can be seen in
the table below (representing nearest binary64 representation of the
true fractions):
j term FPR e AE
0.12345678901234567
1 5^1/1! 5.00000000000000000E+00 3 2^-51 4.4E-16
3 -5^3/3! -2.08333333333333321E+01 5 2^-49 1.8E-15
5 5^5/5! 2.60416666666666679E+01 5 2^-49 1.8E-15
7 -5^7/7! -1.55009920634920633E+01 4 2^-50 8.9E-16
j sum FPR e AE
0.12345678901234567
1 5 5.00000000000000000E+00 3 2^-51 4.4E-16
3 -95/6 -1.58333333333333339E+01 4 2^-50 8.9E-16
5 245/24 1.02083333333333339E+01 4 2^-50 8.9E-16
7 -5335/1008 -5.29265873015873023E+00 3 2^-51 4.4E-16
The accuracy upto 2^-49
can be understood in the following way. If
you look at the term 5^3/3!
then the nearest FPR of this number is
the fraction (5864062014805333 / 9007199254740992) * 2^5
. As you
see, we are missing a part here namely
5^3/3! - 5864062014805333 / 9007199254740992) * 2^5
= 1/844424930131968
~ 1.1842378929335002E-15 < 2^-49
So no matter wat we try to do, we always miss this part. so it is not
possible to compute sin(5)
with an accuracy better then 2^-49
. As
this is the accuarcy of the biggest term (absolute value) that is
added to the sum. However, other terms also introduce errors and all
these errors accumulate. So after the first 30 terms, we know that
the AE for sin(5)
accumulate to be :
2^-51 + 2^-49 + 2^-49 + 2^50 + ... = 5.45594...E-15
This number, is however too idealistic as there is more loss of precision due to loss of significance.
The two presented algorithms are (assuming all variables are IEEE
binary64 (double precision) variables with exception of i
):
algorithm 1: This is a slightly adopted version of the algorithm presented here
fact = 1.0_dp; pow = x; sum = x; xx = x*x
do i=2,30
j=2*i-1
fact = fact*j*(j-1)
pow = -pow * xx
term = pow/fact
sum = sum + term
end do
algorithm 2: This is a slightly adopted version of the algorithm presented here
term = x; sum = x; xx = x*x
do i=2,30
j=2*i-1
term = (-term*xx)/(j*(j-1))
sum = sum + term
end do
While both algorithms are mathematically the same, there is a subtle numerical difference. In floating-point operations, divisions and multiplications are considered to be safe, that is they will always result in the nearest FPR of the true result. That is, with the given input. This, however, does not mean that multiple divisions and multiplication will lead to the nearest FPR of the full computation.
This is why algorithm 2 is slightly worse then algorithm 1. The
computation of term = (-term*xx)/(j*(j-1))
contains multiple
multiplications and divisions and already makes use of an approximated
version of term
. This in contrast to algorithm 1 where term =
pow/fact
is a single operation.
The table below shows the differences in term
starting from j=5
:
j term(alg-1) term(alg-2)
0.12345678901234567 0.12345678901234567
1 5.00000000000000000E+00 5.00000000000000000E+00
3 -2.08333333333333321E+01 -2.08333333333333321E+01
5 2.60416666666666679E+01 2.60416666666666643E+01
7 -1.55009920634920633E+01 -1.55009920634920633E+01
9 5.38228891093474449E+00 5.38228891093474360E+00
11 -1.22324747975789649E+00 -1.22324747975789627E+00
13 1.96033249961201361E-01 1.96033249961201306E-01
15 -2.33372916620477808E-02 -2.33372916620477738E-02
17 2.14497166011468543E-03 2.14497166011468499E-03
The size of the created error is of the magnitude :
AE(term(j-2)) * xx / (j*(j-1))
The answer is clearly a yes, but we need to use some 'tricks'. The easiest way would be to use a higher-precision and then convert everything to double-precision, but this is cheating!
Earlier it was mentioned that loss of significance will introduce more errors and in this comment it was mentioned from 23! onwards, you loose accuracy in your factorial as it cannot be represented correctly anymore as a binary64 number.
We will try to improve the answer, by keeping track of an error term and making use of what is known as compensated summation or Kahan summation. As was indicated early in the beginning of this post, it is the inverse factorial that essentially leads to loss of precision.
computing the inverse factorial: instead of computing the factorial we will compute the inverse of the factorial in the following way. Imagine we computed f ~ 1/(n-1)! with corresponding error-term q such that f + q ~ 1/(n-1)! . We know that the 'FPR' of f can be written in the following way :f = a * b e-p with a, b, e, p integers. So it is possible to use integer division to compute the 1/n! and its corresponding error term using the following set of operations:
f = (a/n) * b^(e-p)
q = (q + mod(a,n) * b^(e-p))/n
in Fortran this leads to the following code :
f = 1.0_dp; q = 0.0_dp
do i = 2,10
! retrieving the integer from 1/(i-1)!
a = int(scale(fraction(f),digits(f)),kind=INT64)
! computing the error on f while keeping track of the
! previous error
q = (q + scale(real(mod(a,i),kind=dp),exponent(f)-digits(f)))/i
! computing f/i resembling 1/i!
f = scale(real(a/i ,kind=dp),exponent(f)-digits(f))
! rebalancing the terms
t = f; f = f + q; q = q - (f - t)
end do
Here, the last line rebalances f
with its error q
. A trick
stemming from the Kahan summation.
Kahan summation with inverse factorial ... the new sin
:
It is now possible to combine this trick with algorithm 2 to make it a bit better :
pow = x; sum = x; xx = x*x; err = 0.0_dp;
f = 1.0_dp; q = 0.0_dp
do i=2,30
j=2*i-1
! The factorial part
a = int(scale(fraction(f),digits(f)),kind=INT64)
q = (q + scale(real(mod(a,j*(j-1)),kind=dp),exponent(f)-digits(f)))/(j*(j-1))
f = scale(real(a/(j*(j-1)) ,kind=dp),exponent(f)-digits(f))
t = f; f = f + q; q = q - (f - t)
pow = -pow*xx ! computes x^j
! Kahan summation
t = pow*f; err = err + (t - ((sum + t) - sum)); sum = sum + t
t = pow*q; err = err + (t - ((sum + t) - sum)); sum = sum + t
t = sum; sum = sum + err; err = err - (sum - t)
end do
This algorithm leads to remarkably good results. An excellent accuracy in the quadrant 1 and 2 and slightly worse in the third quadrant. The fourth quadrant is still okay. It has however, horrible accuracy near Pi and 2Pi.
Upvotes: 1
Reputation: 153508
I can't figure out where the error is.
A high precisions sine(5.0)
is -0.95892427466313846889...
OP’s result is better than Prof’s.
OP's result is within 14 ULP of the best answer whereas Prof's is 25 ULP off.
So no problem on OP’s part. To get an exact match to the Prof's answer, you would have to code an inferior approach.
A simple possible reason for the Prof's inferior answer is if Prof's code looped only while i<=30
(15 terms, rather than 30) - which would just about explain the difference. Try using your code with fewer iterations and see what iteration count closest matches the Prof's answer.
// sine(5.0) Delta from correct Source
//
//-0.95892427466313846889... 0 000000 calc
//-0.95892427466313823 -23 889... chux (via some C code)
//-0.95892427466314001 +154 111... OP
//-0.95892427466313568 -278 889... Prof
// 0.00000000000000089 +/-89 ... ulp(5.0)
// 0.00000000000000011 +/-11 ... ulp(-0.9589)
// 12345678901234567(digit count)
Notes:
Near x == 5.0
, after about the 17th term, the temp
term is so small to not significantly affect the result. So certainly enough terms are used.
With common FP representation, the unit in the last place of 5 is ~89e-17. The ULP of -0.9589 if is ~11e-17.
Upvotes: 4
Reputation: 9382
I will emulate the two algorithms with some ArbitraryPrecisionFloat to illustrate how worse is the factorial solution numerically: I'll use Squeak Smalltalk here, but the language does not really matter, you could do it in Maple or Python, as long as you have some arbitrary precision library available...
The closest binary64 floating point number to the exact result of sin(5)
is -0.9589242746631385
.
We'll see how well the two algorithms approximate that ideal value for different precision (ranging from single precision 24 bits to long double precision 64 bits).
p := 24 to: 64.
e1 := p collect: [:n |
| x denom temp summ closest |
closest := (5 asArbitraryPrecisionFloatNumBits: 100) sin asFloat.
x := 5 asArbitraryPrecisionFloatNumBits: n.
numTerms := 30.
denom := (2*numTerms)-1.
temp := x.
summ := x.
3 to: denom by: 2 do: [:i |
temp := (((0-temp)*(x**2))/(i*(i-1))).
summ := summ + temp ].
(summ asFloat - closest) abs].
Then the factorial rewrite:
e2 := p collect: [:n |
| x denom temp summ closest fact |
closest := (5 asArbitraryPrecisionFloatNumBits: 100) sin asFloat.
x := 5 asArbitraryPrecisionFloatNumBits: n.
numTerms := 30.
denom := (2*numTerms)-1.
temp := x.
summ := x.
3 to: denom by: 2 do: [:i |
fact := ((1 to: i) collect: [:k | k asArbitraryPrecisionFloatNumBits: n]) product.
temp := ((x ** i)*(-1 ** (i//2)))/fact.
summ := summ + temp ].
(summ asFloat - closest) abs].
We can then plot the result in whatever language (here Matlab)
p=24:64;
e1=[1.8854927952283163e-8 4.8657250339978475e-8 2.5848555629259806e-8 6.355841153382613e-8 3.953766758435506e-9 2.071757310151412e-8 2.0911216092045493e-9 6.941377472813315e-10 4.700154709880167e-10 9.269683909352011e-10 6.256184459374481e-11 3.1578795134379334e-10 2.4749646776456302e-11 3.202560439063973e-11 1.526812010155254e-11 8.378742144543594e-12 3.444688978504473e-12 6.105005390111273e-12 9.435785486289205e-13 7.617240171953199e-13 2.275957200481571e-14 1.6486811915683575e-13 2.275957200481571e-14 5.1181281435219717e-14 1.27675647831893e-14 1.2101430968414206e-14 1.2212453270876722e-15 2.7755575615628914e-15 5.551115123125783e-16 1.5543122344752192e-15 1.1102230246251565e-16 1.1102230246251565e-16 0.0 1.1102230246251565e-16 0.0 0.0 0.0 0.0 0.0 0.0 0.0];
e2=[9.725292443585332e-7 4.281799078631465e-7 2.721746682476933e-7 1.823107481646602e-7 9.336073392152144e-8 5.1925587718493205e-8 1.6992282803052206e-8 6.756442849642497e-9 5.1179199767048544e-9 3.0311525511805826e-9 1.2180066955025382e-9 6.155346232716852e-10 2.8668412088705963e-10 6.983780220792823e-11 6.476741365446514e-11 3.8914982347648674e-11 1.7473689162272876e-11 1.2084888645347291e-11 4.513389662008649e-12 1.7393864126802328e-12 1.273314786942592e-12 5.172529071728604e-13 2.5013324744804777e-13 1.6198153929281034e-13 6.894484982922222e-14 2.8754776337791554e-14 1.6542323066914832e-14 8.770761894538737e-15 4.773959005888173e-15 2.7755575615628914e-15 7.771561172376096e-16 3.3306690738754696e-16 3.3306690738754696e-16 1.1102230246251565e-16 1.1102230246251565e-16 0.0 0.0 0.0 0.0 0.0 0.0];
figure; semilogy(p,e1,p,e2,'-.'); legend('your algorithm','factorial algorithm'); xlabel('float precision'); ylabel('error')
Your algorithm performs better: one magnitude of error lesser than the factorial variant:
The worse thing in factorial variant is that it depends on the intrinsic power function x**power
. This function does not necessarily answer the nearest floating point to the exact result and may vary depending on the underlying mathematical library implementation. So asking for a bit identical result that depends not only on strict IEEE 754 compliance but also on an implementation defined accuracy is a really dumb thing - unless all students have exact same hardware and software - but even then what a lesson is it wrt. what every scientist should know about floating point?
Upvotes: 3
Reputation: 411
I ended up talking to a classmate who got the exact answer. He told me that he built a factorial function, then he suggested that I solve for the terms using an if else statement: if(odd), then add. else if(even), then subtract. So I followed his suggestion and ended up with the correct output.
For reference, these are the test cases of my prof:
5 30 -> -0.95892427466313568
4 100 -> -0.75680249530792754
This is my code:
! Sine value using MacLaurin series
recursive function factorial(n) result (f)
double precision :: f
double precision, intent(in) :: n
if (n <= 0) then
f = 1
else
f = n * factorial(n-1)
end if
end function factorial
program SineApprox
implicit none
integer :: numTerms, i, oddeven
double precision :: x, summ, factorial, odd, even, power
! Read Angle in Radians, and numTerms
read(*,*) x, numTerms
! Computing MacLaurin series
summ = 0
power = 1
do i = 1, numTerms, 1
oddeven = modulo(i,2)
if(oddeven == 1) then
odd = (x**power)/factorial(power)
summ = summ + odd
else if (oddeven == 0) then
even = (x**(power))/factorial(power)
summ = summ - even
end if
power = power + 2
end do
print *, summ
end program SineApprox
Upvotes: 1