zuzu
zuzu

Reputation: 411

Small difference in Fortran Sine function using MacLaurin expansion

I'm creating a program in Fortran that takes in an x for sin(x) in radians, then the number of terms to compute.

This is my program:

! Sine value using MacLaurin series 

program SineApprox
    implicit none
    integer :: numTerms, denom, i
    double precision :: x, temp, summ

    ! Read Angle in Radians, and numTerms
    read(*,*) x, numTerms

    ! Computing MacLaurin series
    denom = (2*numTerms)-1  !Gives denominator
    temp = x                !Temp calculates each term
    summ = x

    do i = 3, denom, 2
        temp = (((-temp)*(x**2))/(i*(i-1)))
        summ = summ + temp 
    end do
    print *, summ

end program SineApprox

However, I'm not getting the same value that my prof requires for the input: 5 30

The output of my code is:

-0.95892427466314001  

But, the required output is:

-0.95892427466313568
                ^^^^

I can't figure out where the error is.

Upvotes: 3

Views: 532

Answers (4)

kvantour
kvantour

Reputation: 26481

In contrast to what was demonstrated in both Chux's and aka.nice their answers and mentioned in various comments, I believe we jumped conclusion by focusing only on a single computation, i.e. sin(5). Eventhough the OP's answer is closer to the true value, it must be said that both the OP's algorithm and the factorial algorithm are as bad for computing sin(5), but in the end, the factorial algorithm is the better one.

This answer will not go into to much detail about Floating-Point Arithmetic. An excelent monograph can be found on this topic in What Every Computer Scientist Should Know About Floating-Point Arithmetic. Neither will I open the can of worms of Intermediate Floating-Point Precision.

disclaimer: I don't want to claim that what I write here is 100% correct and I am for sure not an authority in this field. I just found the question extremely interesting and tried to get as much out of it as I could. I obviously welcome any comment!

some interesting reads that lead to this:

Some initial concepts

  • floating point representation (FPR): in a finite floating point representation, a number is written as ±0.d1d2 ... dp x be. It has a precision p and an even base b where all digits di are integers with 0 ≤ di < b. It is straightforward that such representation cannot represent all real numbers. For example, if b = 10 and p = 3, the number 1/3 is approximated as 0.333 x 100 or if b=2 and p=24 one has to approximate 0.1 as 0.110011001100110011001101 x 25.

  • absolute error (AE): if a real number x has an AE of Δx, then we know that the expected true value of x lays between x - Δx ≤ x̃ ≤ x + Δx. If x is the closest FPR to the true value , then we know that its AE is ( (b/2) b-p - 1 ) x be. For example, if b = 10 and p = 3, the number 1/3 is approximated as 0.333 x 100 and has an absolute error of 0.0005 x 100 indicating that 0.3325 x 100 ≤ 1/3 ≤ 0.3335 x 100

What does this all mean for the Taylor-MacLaurin series?

For a standard IEEE binary64 (double precision) computation (b = 2 and p = 53) of the Taylor-MacLaurin series of sin(5) one quickly sees that the horse already bolted at the 2nd and 3rd term. Here, the FPR is only accurate upto 2^-49 as can be seen in the table below (representing nearest binary64 representation of the true fractions):

j term        FPR                     e AE
              0.12345678901234567    
1  5^1/1!     5.00000000000000000E+00 3 2^-51 4.4E-16
3 -5^3/3!    -2.08333333333333321E+01 5 2^-49 1.8E-15
5  5^5/5!     2.60416666666666679E+01 5 2^-49 1.8E-15
7 -5^7/7!    -1.55009920634920633E+01 4 2^-50 8.9E-16

j  sum        FPR                      e AE
              0.12345678901234567    
1     5       5.00000000000000000E+00  3 2^-51 4.4E-16
3   -95/6    -1.58333333333333339E+01  4 2^-50 8.9E-16
5   245/24    1.02083333333333339E+01  4 2^-50 8.9E-16
7 -5335/1008 -5.29265873015873023E+00  3 2^-51 4.4E-16

The accuracy upto 2^-49 can be understood in the following way. If you look at the term 5^3/3! then the nearest FPR of this number is the fraction (5864062014805333 / 9007199254740992) * 2^5. As you see, we are missing a part here namely

5^3/3! - 5864062014805333 / 9007199254740992) * 2^5
    = 1/844424930131968
    ~ 1.1842378929335002E-15 < 2^-49

So no matter wat we try to do, we always miss this part. so it is not possible to compute sin(5) with an accuracy better then 2^-49. As this is the accuarcy of the biggest term (absolute value) that is added to the sum. However, other terms also introduce errors and all these errors accumulate. So after the first 30 terms, we know that the AE for sin(5) accumulate to be :

2^-51 + 2^-49 + 2^-49 + 2^50 + ... = 5.45594...E-15

This number, is however too idealistic as there is more loss of precision due to loss of significance.

Which algorithm is better?

The two presented algorithms are (assuming all variables are IEEE binary64 (double precision) variables with exception of i):

  • algorithm 1: This is a slightly adopted version of the algorithm presented here

    fact = 1.0_dp; pow  = x; sum = x; xx = x*x
    do i=2,30
       j=2*i-1
       fact = fact*j*(j-1)
       pow  = -pow * xx
       term = pow/fact
       sum = sum + term
    end do
    
  • algorithm 2: This is a slightly adopted version of the algorithm presented here

    term = x; sum = x; xx = x*x
    do i=2,30
      j=2*i-1
       term = (-term*xx)/(j*(j-1))
       sum = sum + term
    end do
    

While both algorithms are mathematically the same, there is a subtle numerical difference. In floating-point operations, divisions and multiplications are considered to be safe, that is they will always result in the nearest FPR of the true result. That is, with the given input. This, however, does not mean that multiple divisions and multiplication will lead to the nearest FPR of the full computation.

This is why algorithm 2 is slightly worse then algorithm 1. The computation of term = (-term*xx)/(j*(j-1)) contains multiple multiplications and divisions and already makes use of an approximated version of term. This in contrast to algorithm 1 where term = pow/fact is a single operation.

The table below shows the differences in term starting from j=5:

 j  term(alg-1)              term(alg-2)
    0.12345678901234567      0.12345678901234567
 1  5.00000000000000000E+00  5.00000000000000000E+00
 3 -2.08333333333333321E+01 -2.08333333333333321E+01
 5  2.60416666666666679E+01  2.60416666666666643E+01
 7 -1.55009920634920633E+01 -1.55009920634920633E+01
 9  5.38228891093474449E+00  5.38228891093474360E+00
11 -1.22324747975789649E+00 -1.22324747975789627E+00
13  1.96033249961201361E-01  1.96033249961201306E-01
15 -2.33372916620477808E-02 -2.33372916620477738E-02
17  2.14497166011468543E-03  2.14497166011468499E-03

The size of the created error is of the magnitude :

AE(term(j-2)) * xx / (j*(j-1))

Can we still make improvements?

The answer is clearly a yes, but we need to use some 'tricks'. The easiest way would be to use a higher-precision and then convert everything to double-precision, but this is cheating!

Earlier it was mentioned that loss of significance will introduce more errors and in this comment it was mentioned from 23! onwards, you loose accuracy in your factorial as it cannot be represented correctly anymore as a binary64 number.

We will try to improve the answer, by keeping track of an error term and making use of what is known as compensated summation or Kahan summation. As was indicated early in the beginning of this post, it is the inverse factorial that essentially leads to loss of precision.

computing the inverse factorial: instead of computing the factorial we will compute the inverse of the factorial in the following way. Imagine we computed f ~ 1/(n-1)! with corresponding error-term q such that f + q ~ 1/(n-1)! . We know that the 'FPR' of f can be written in the following way :f = a * b e-p with a, b, e, p integers. So it is possible to use integer division to compute the 1/n! and its corresponding error term using the following set of operations:

f = (a/n) * b^(e-p)
q = (q + mod(a,n) * b^(e-p))/n

in Fortran this leads to the following code :

f = 1.0_dp; q = 0.0_dp
do i = 2,10
   ! retrieving the integer from 1/(i-1)!
   a = int(scale(fraction(f),digits(f)),kind=INT64)
   ! computing the error on f while keeping track of the
   ! previous error
   q = (q + scale(real(mod(a,i),kind=dp),exponent(f)-digits(f)))/i
   ! computing f/i resembling 1/i!
   f =      scale(real(a/i     ,kind=dp),exponent(f)-digits(f))
   ! rebalancing the terms
   t = f; f = f + q; q = q - (f - t)
end do

Here, the last line rebalances f with its error q. A trick stemming from the Kahan summation.

Kahan summation with inverse factorial ... the new sin:

It is now possible to combine this trick with algorithm 2 to make it a bit better :

pow = x; sum = x; xx = x*x; err = 0.0_dp;
f = 1.0_dp; q = 0.0_dp
do i=2,30
   j=2*i-1
   ! The factorial part
   a = int(scale(fraction(f),digits(f)),kind=INT64)  
   q = (q + scale(real(mod(a,j*(j-1)),kind=dp),exponent(f)-digits(f)))/(j*(j-1))
   f =      scale(real(a/(j*(j-1))   ,kind=dp),exponent(f)-digits(f))
   t = f; f = f + q; q = q - (f - t)

   pow = -pow*xx  ! computes x^j
   ! Kahan summation
   t = pow*f; err = err + (t - ((sum + t) - sum)); sum = sum + t
   t = pow*q; err = err + (t - ((sum + t) - sum)); sum = sum + t
   t = sum; sum = sum + err; err = err - (sum - t)
end do

This algorithm leads to remarkably good results. An excellent accuracy in the quadrant 1 and 2 and slightly worse in the third quadrant. The fourth quadrant is still okay. It has however, horrible accuracy near Pi and 2Pi.

Upvotes: 1

chux
chux

Reputation: 153508

I can't figure out where the error is.

A high precisions sine(5.0) is -0.95892427466313846889...

OP’s result is better than Prof’s.

OP's result is within 14 ULP of the best answer whereas Prof's is 25 ULP off.

So no problem on OP’s part. To get an exact match to the Prof's answer, you would have to code an inferior approach.

A simple possible reason for the Prof's inferior answer is if Prof's code looped only while i<=30 (15 terms, rather than 30) - which would just about explain the difference. Try using your code with fewer iterations and see what iteration count closest matches the Prof's answer.


//  sine(5.0)             Delta from correct  Source
//  
//-0.95892427466313846889...    0 000000      calc
//-0.95892427466313823        -23 889...      chux (via some C code)
//-0.95892427466314001       +154 111...      OP
//-0.95892427466313568       -278 889...      Prof
// 0.00000000000000089      +/-89 ...         ulp(5.0)
// 0.00000000000000011      +/-11 ...         ulp(-0.9589)
//   12345678901234567(digit count)

Notes:
Near x == 5.0, after about the 17th term, the temp term is so small to not significantly affect the result. So certainly enough terms are used.

With common FP representation, the unit in the last place of 5 is ~89e-17. The ULP of -0.9589 if is ~11e-17.

Upvotes: 4

aka.nice
aka.nice

Reputation: 9382

I will emulate the two algorithms with some ArbitraryPrecisionFloat to illustrate how worse is the factorial solution numerically: I'll use Squeak Smalltalk here, but the language does not really matter, you could do it in Maple or Python, as long as you have some arbitrary precision library available...

The closest binary64 floating point number to the exact result of sin(5) is -0.9589242746631385.

We'll see how well the two algorithms approximate that ideal value for different precision (ranging from single precision 24 bits to long double precision 64 bits).

p := 24 to: 64.
e1 := p collect: [:n |
    | x denom temp summ closest |
    closest := (5 asArbitraryPrecisionFloatNumBits: 100) sin asFloat.
    x := 5 asArbitraryPrecisionFloatNumBits: n.
    numTerms := 30.
    denom := (2*numTerms)-1.
    temp := x.
    summ := x.
    3 to: denom by: 2 do: [:i |
        temp := (((0-temp)*(x**2))/(i*(i-1))).
        summ := summ + temp ].
    (summ asFloat - closest) abs].

Then the factorial rewrite:

e2 := p collect: [:n |
    | x denom temp summ closest fact |
    closest := (5 asArbitraryPrecisionFloatNumBits: 100) sin asFloat.
    x := 5 asArbitraryPrecisionFloatNumBits: n.
    numTerms := 30.
    denom := (2*numTerms)-1.
    temp := x.
    summ := x.
    3 to: denom by: 2 do: [:i |
        fact := ((1 to: i) collect: [:k | k asArbitraryPrecisionFloatNumBits: n]) product.
        temp := ((x ** i)*(-1 ** (i//2)))/fact.
        summ := summ + temp ].
    (summ asFloat - closest) abs].

We can then plot the result in whatever language (here Matlab)

p=24:64;
e1=[1.8854927952283163e-8 4.8657250339978475e-8 2.5848555629259806e-8 6.355841153382613e-8 3.953766758435506e-9 2.071757310151412e-8 2.0911216092045493e-9 6.941377472813315e-10 4.700154709880167e-10 9.269683909352011e-10 6.256184459374481e-11 3.1578795134379334e-10 2.4749646776456302e-11 3.202560439063973e-11 1.526812010155254e-11 8.378742144543594e-12 3.444688978504473e-12 6.105005390111273e-12 9.435785486289205e-13 7.617240171953199e-13 2.275957200481571e-14 1.6486811915683575e-13 2.275957200481571e-14 5.1181281435219717e-14 1.27675647831893e-14 1.2101430968414206e-14 1.2212453270876722e-15 2.7755575615628914e-15 5.551115123125783e-16 1.5543122344752192e-15 1.1102230246251565e-16 1.1102230246251565e-16 0.0 1.1102230246251565e-16 0.0 0.0 0.0 0.0 0.0 0.0 0.0];
e2=[9.725292443585332e-7 4.281799078631465e-7 2.721746682476933e-7 1.823107481646602e-7 9.336073392152144e-8 5.1925587718493205e-8 1.6992282803052206e-8 6.756442849642497e-9 5.1179199767048544e-9 3.0311525511805826e-9 1.2180066955025382e-9 6.155346232716852e-10 2.8668412088705963e-10 6.983780220792823e-11 6.476741365446514e-11 3.8914982347648674e-11 1.7473689162272876e-11 1.2084888645347291e-11 4.513389662008649e-12 1.7393864126802328e-12 1.273314786942592e-12 5.172529071728604e-13 2.5013324744804777e-13 1.6198153929281034e-13 6.894484982922222e-14 2.8754776337791554e-14 1.6542323066914832e-14 8.770761894538737e-15 4.773959005888173e-15 2.7755575615628914e-15 7.771561172376096e-16 3.3306690738754696e-16 3.3306690738754696e-16 1.1102230246251565e-16 1.1102230246251565e-16 0.0 0.0 0.0 0.0 0.0 0.0];
figure; semilogy(p,e1,p,e2,'-.'); legend('your algorithm','factorial algorithm'); xlabel('float precision'); ylabel('error')

Your algorithm performs better: one magnitude of error lesser than the factorial variant:

enter image description here

The worse thing in factorial variant is that it depends on the intrinsic power function x**power. This function does not necessarily answer the nearest floating point to the exact result and may vary depending on the underlying mathematical library implementation. So asking for a bit identical result that depends not only on strict IEEE 754 compliance but also on an implementation defined accuracy is a really dumb thing - unless all students have exact same hardware and software - but even then what a lesson is it wrt. what every scientist should know about floating point?

Upvotes: 3

zuzu
zuzu

Reputation: 411

I ended up talking to a classmate who got the exact answer. He told me that he built a factorial function, then he suggested that I solve for the terms using an if else statement: if(odd), then add. else if(even), then subtract. So I followed his suggestion and ended up with the correct output.

For reference, these are the test cases of my prof:

5 30 -> -0.95892427466313568

4 100 -> -0.75680249530792754

This is my code:

! Sine value using MacLaurin series 

recursive function factorial(n) result (f)
    double precision :: f
    double precision, intent(in) :: n
    if (n <= 0) then
        f = 1
    else
        f = n * factorial(n-1)
    end if
end function factorial

program SineApprox
    implicit none
    integer :: numTerms, i, oddeven
    double precision :: x, summ, factorial, odd, even, power

    ! Read Angle in Radians, and numTerms
    read(*,*) x, numTerms

    ! Computing MacLaurin series
    summ = 0
    power = 1
    do i = 1, numTerms, 1
        oddeven = modulo(i,2)

        if(oddeven == 1) then
            odd = (x**power)/factorial(power)
            summ = summ + odd
        else if (oddeven == 0) then
            even = (x**(power))/factorial(power)
            summ = summ - even
        end if
        power = power + 2
    end do

    print *, summ

end program SineApprox

Upvotes: 1

Related Questions