Mascarpone
Mascarpone

Reputation: 2556

The numerical solution doesn't diverge as it should. Why?

I'm trying to use the Euler Method to approximate a differential equation:

u'=3*(u-t)
u(0)=1/3

To problem should diverge for a big number of steps with float precision. This is due to the rounding error of the initial data. To some of my friends this code diverges, for me it doesn't.

Is it possible that the compiler is increasing precision?

Update for comment:

@KillzoneKid no g++ -Wall -pedantic main.cpp doesn't print anything

@some-programmer-dude the actual output is the exact solution, while the output should diverge, due to floating error. Something like 7189 instead of 10+ 1/3 (exact solution)

@pac0 the compiled output doesn't work, because everybody else is using a very old version of linux (kernel 2.6), but I'll try to set up the compiler options.

The trend is that people with mac (friends of mine) and people with modern linux (me) experience this issue, while people on windows or very old linux don't.

@1201ProgramAlarm I use ubuntu 17.10, on a XPS 15, with CLion as editor. For simplicity, I'm using g++ bundled in the OS.

#include <iostream>
#include <math.h>
#include <fstream>

using namespace std;
typedef float Real;
Real f(Real t,Real u);
const Real pi=4.0*atan(1.0);

int main() {

    Real u,t,T,tau;


    // print to file
    char n_file[21]={0};
    std::cout << "file name: " << std::endl;
    std::cin >> n_file ;

    ofstream prt(n_file);
    prt.precision(15);
    //

    t=0.0;//start time
    T=10.0;//final time
    u=1.0/3.0;// u(0)
    unsigned long N;
    for(int k=1;k<=20;k++){
        N=(unsigned long)pow(10,k);
        tau=(T-t)/Real(N);
        for(unsigned long n=1;n<=N;n++){
            u+=tau*f(t,u);
            t+=tau;
        }
        prt << "With " << N << " steps, at time " << tau << "result is " <<u<<endl;

        prt << endl << endl << endl;

        u = 1.0/3.0;
        t = 0.0;

    }

//

    return 0;
}


//
Real f(Real t, Real u)
{
    return 3*(u-t);
}

Upvotes: 0

Views: 153

Answers (1)

rcgldr
rcgldr

Reputation: 28921

The calculation for u=1.0/3.0 + 1e-8; is done in double precision, then rounded to the nearest float value when assigned to u. Using Windows and Visual Studio, the + 1e-8 gets rounded away during the conversion from double to float, but + 1e-7 is large enough to affect the float result:

1.0/3.0 + 1e-8 == 1.0/3.0 == 32 bit hex integer 0x3eaaaaab
                          ~= 0.33333334

1.0/3.0 + 1e-7 ==            32 bit hex integer 0x3eaaaaae
                          ~= 0.33333343

The difference between environments could be an issue of the rounding setting in the floating point control word, and also hardware implementations.


I changed the code to use a step count that is a power of 2 (in this case a power of 8), and limited the max number of steps to correspond to the number of significant digits in the mantissa part of a float. This eliminated the divergence, partially because the initial value of u is slightly greater than 1/3, while t and tau are exact (since step count is a power of 2), and partially because multiplying by tau decreases the error more than the repeated addition increases the the error. Initialize u to 1/3 - 1e-7, and the sum diverges at 64 steps, becoming -5770, but for 8 steps it's 10.30, and for >= 512 steps it's 10.333333.

#include <iomanip>
#include <iostream>
#include <math.h>
#include <fstream>

using namespace std;
typedef float Real;
Real f(Real t,Real u);

int main() {
    Real u,t,T,tau;

    // print to file
    char n_file[21]={0};
    std::cout << "file name: " << std::endl;
    std::cin >> n_file ;

    ofstream prt(n_file);
    prt.precision(15);

    T=10.0;    //final time
    unsigned long N = 1;
    for(int k=1;k<=7;k++){
        N *= 8;                 // # steps is power of 2
        u = (Real)(1.0/3.0);
        t = 0.0;
        tau=T/Real(N);
        for(unsigned long n=1;n<=N;n++){
            u+=tau*f(t,u);
            t+=tau;
        }
        prt << "With " << setw(8) << N << " steps result is " << u <<endl;
    }

    return 0;
}

Real f(Real t, Real u)
{
    return 3*(u-t);
}

Upvotes: 1

Related Questions