Reputation: 169
I am currently experimenting with polynomial fitting using jupyter. The function below returns the least-square polynomial of degree m given the data points in xs with corresponding ys.
from numpy import *
from matplotlib import pyplot as plt
def findas(m,xs,ys):
A = array([[0]*(m+1)]*(m+1))
b = array([0]*(m+1))
for k in range(m+1):
b[k] = sum(ys*xs**k)
for i in range(m+1):
A[k,i] = sum(xs**(k+i))
coefs = linalg.solve(A,b)
print(coefs)
def fit(x):
return sum(coefs*(x**array(range(len(coefs)))))
return fit
Suppose I have the following six data points and fit a polynomial of degree 5:
xs = array([1,2,3,4,5,6])
ys = array([-5.21659 ,2.53152 ,2.05687 ,14.1135 ,20.9673 ,33.5652])
ft = findas(5,xs,ys)
From my understanding, the resulting curve should pass through every single data point exactly (in fact, the Lagrange polynomial should be the result).
xdense = arange(1,6.1,0.1)
ydense = [ft(x) for x in xdense]
plt.plot(xdense,ydense)
plt.plot(xs,ys,'rx')
plt.show()
Sample output:
However, this is not the case. The curve is quite far off! What is going on here? Does this have something to do with round-off error? Thanks in advance!
Upvotes: 0
Views: 355
Reputation: 169
It seems there was a truncation error! The block of code
A = array([[0]*(m+1)]*(m+1))
b = array([0]*(m+1))
for k in range(m+1):
...
should read:
A = array([[0.]*(m+1)]*(m+1))
b = array([0.]*(m+1))
for k in range(m+1):
...
i.e we have to specify the zeros as float.
Moreover, round-off errors can amplify in the process of inverting a matrix. This will particularly be the case when the eigenvalues of the matrix that we want to invert differ significantly in their order of magnitude.
Upvotes: 1
Reputation: 8219
Your code appears correct; you re-discovered the issues with trying to invert nearly-singular matrices with finite-precision arithmetic. The matrix A looks like this
[[ 6 21 91 441 2275 12201]
[ 21 91 441 2275 12201 67171]
[ 91 441 2275 12201 67171 376761]
[ 441 2275 12201 67171 376761 2142595]
[ 2275 12201 67171 376761 2142595 12313161]
[ 12201 67171 376761 2142595 12313161 71340451]]
Note how big the difference is between the largest and the smallest value. Its eigevalues are given by
[7.35326515e+07 1.98781177e+04 5.75757797e+01 1.74921341e+00
5.89932892e-02 1.37532926e-04]
note the ratio of the largest to the smallest is about 10^11. So the matrix is not singular in the theoretical sense but pretty much singular for numerical calculations. Its inversion leads to extremely large round-off errors, just like dividing by very small numbers would, massively losing precision in the final result.
Lots more detail here and associated links
Upvotes: 0