Polynomial fitting problems in numpy

Question

I was trying to implement a standard polynomial fitting program, and came across a problem that I could not understand. Here is the code where I have a sample x and y data, which I fit using the normal equations, and also using the polyfit function in numpy.

def mytest(N):
    print "
Polynomial order (N): {}".format(N)
    mVals = [5, 10, 15, 20]
    a_orig = [198.764, 13.5, 0.523]
    for M in mVals:
        x = arange(-M, M+1)
        y = matrix(a_orig[0]+ a_orig[1]*x + a_orig[2]*x**2).T

        # Code implementing the solution from the normal equations
        nArray = arange(N+1)    
        A = matrix([[n**i for i in nArray] for n in x])
        B = (A.T*A).I
        a_myfit = B*(A.T*y)

        # numpy's polyfit    
        a_polyfit = polyfit(x, y, N)

        print "M: {}".format(M)
        print ["{0:0.3f}".format(i) for i in a_orig] 
        print ["{0:0.3f}".format(i) for i in array(a_myfit)[:,0]]
        print ["{0:0.3f}".format(i) for i in list(array(a_polyfit)[:,0])[::-1]]

mytest(N=5)
mytest(N=6)

Here is the output

Polynomial order (N): 5
M: 5
['198.764', '13.500', '0.523']
['198.764', '13.500', '0.523', '0.000', '0.000', '-0.000']
['198.764', '13.500', '0.523', '-0.000', '0.000', '0.000']
M: 10
['198.764', '13.500', '0.523']
['198.764', '13.500', '0.523', '-0.000', '0.000', '-0.000']
['198.764', '13.500', '0.523', '0.000', '0.000', '0.000']
M: 15
['198.764', '13.500', '0.523']
['198.764', '13.500', '0.523', '0.000', '0.000', '-0.000']
['198.764', '13.500', '0.523', '-0.000', '-0.000', '0.000']
M: 20
['198.764', '13.500', '0.523']
['198.764', '13.500', '0.523', '0.000', '-0.000', '-0.000']
['198.764', '13.500', '0.523', '0.000', '0.000', '0.000']

Polynomial order (N): 6
M: 5
['198.764', '13.500', '0.523']
['198.764', '13.500', '0.523', '0.000', '0.000', '-0.000', '-0.000']
['198.764', '13.500', '0.523', '-0.000', '0.000', '0.000', '-0.000']
M: 10
['198.764', '13.500', '0.523']
['198.764', '13.500', '0.523', '-0.000', '-0.000', '-0.000', '-0.000']
['198.764', '13.500', '0.523', '0.000', '0.000', '-0.000', '-0.000']
M: 15
['198.764', '13.500', '0.523']
['294.451', '13.500', '-0.061', '0.000', '-0.001', '-0.000', '-0.000']
['198.764', '13.500', '0.523', '0.000', '0.000', '0.000', '-0.000']
M: 20
['198.764', '13.500', '0.523']
['369.135', '13.500', '-0.046', '0.000', '-0.000', '-0.000', '-0.000']
['198.764', '13.500', '0.523', '0.000', '0.000', '-0.000', '-0.000']

The values of the polynomial fit gives wrong values for N > 5 and for M > 13.

Where am I going wrong? What is different in the polyfit implementation?

unutbu · Accepted Answer

A is of dtype int32.

The maximum value representable is:

In [10]: np.iinfo(np.dtype('int32')).max
Out[10]: 2147483647

When the exponent is very large, n**i can become larger than 2147483647. At this point you get an incorrect result.

NumPy does not check for arithmetic overflows (when performing operations on arrays) since this would hinder performance. It is up to you to select a dtype which avoids arithmetic overflow.

To fix the problem, declare a different dtype which can represent larger numbers:

A = np.matrix([[n**i for i in nArray] for n in x], dtype='float64')

Note that the problem still exists, it will just occur at a much larger value of N.

Polynomial fitting problems in numpy

Answers (1)

Related Questions