Algebro1000
Algebro1000

Reputation: 169

Polynomial fitting with equal number of data points and coefficients

I am currently experimenting with polynomial fitting using jupyter. The function below returns the least-square polynomial of degree m given the data points in xs with corresponding ys.

from numpy import *
from matplotlib import pyplot as plt

def findas(m,xs,ys):
    A = array([[0]*(m+1)]*(m+1))
    b = array([0]*(m+1))
    for k in range(m+1):
        b[k] = sum(ys*xs**k)
        for i in range(m+1):
            A[k,i] = sum(xs**(k+i))
    coefs = linalg.solve(A,b)
    print(coefs)
    def fit(x):
        return sum(coefs*(x**array(range(len(coefs)))))
    return fit

Suppose I have the following six data points and fit a polynomial of degree 5:

xs = array([1,2,3,4,5,6])
ys = array([-5.21659 ,2.53152 ,2.05687 ,14.1135 ,20.9673 ,33.5652])
ft = findas(5,xs,ys)

From my understanding, the resulting curve should pass through every single data point exactly (in fact, the Lagrange polynomial should be the result).

xdense = arange(1,6.1,0.1)
ydense = [ft(x) for x in xdense]   

plt.plot(xdense,ydense)
plt.plot(xs,ys,'rx')
plt.show()

Sample output:

enter image description here

However, this is not the case. The curve is quite far off! What is going on here? Does this have something to do with round-off error? Thanks in advance!

Upvotes: 0

Views: 355

Answers (2)

Algebro1000
Algebro1000

Reputation: 169

It seems there was a truncation error! The block of code

A = array([[0]*(m+1)]*(m+1))
b = array([0]*(m+1))
for k in range(m+1):
...

should read:

A = array([[0.]*(m+1)]*(m+1))
b = array([0.]*(m+1))
for k in range(m+1):
...

i.e we have to specify the zeros as float.

Moreover, round-off errors can amplify in the process of inverting a matrix. This will particularly be the case when the eigenvalues of the matrix that we want to invert differ significantly in their order of magnitude.

Upvotes: 1

piterbarg
piterbarg

Reputation: 8219

Your code appears correct; you re-discovered the issues with trying to invert nearly-singular matrices with finite-precision arithmetic. The matrix A looks like this

[[       6       21       91      441     2275    12201]
 [      21       91      441     2275    12201    67171]
 [      91      441     2275    12201    67171   376761]
 [     441     2275    12201    67171   376761  2142595]
 [    2275    12201    67171   376761  2142595 12313161]
 [   12201    67171   376761  2142595 12313161 71340451]]

Note how big the difference is between the largest and the smallest value. Its eigevalues are given by

[7.35326515e+07 1.98781177e+04 5.75757797e+01 1.74921341e+00
 5.89932892e-02 1.37532926e-04]

note the ratio of the largest to the smallest is about 10^11. So the matrix is not singular in the theoretical sense but pretty much singular for numerical calculations. Its inversion leads to extremely large round-off errors, just like dividing by very small numbers would, massively losing precision in the final result.

Lots more detail here and associated links

Upvotes: 0

Related Questions