Reputation:
I have been using scipy to fit 3d data to a surface, which is defined as a polynomial function. But the result looks not so close to the data. How can I improve the fitting?
import numpy as np
from scipy.optimize import curve_fit
# import my data
data = my_data_matrix
# define polynomial function
def func(X, A, B, C, D, E, F):
# unpacking the multi-dim. array column-wise, that's why the transpose
x, y, z = X.T
return (A * x ** 2) + (B * y ** 2) + (C * x * y) + (D * x) + (E * y) + F
# fit the polynomial function to the 3d data
popt, _ = curve_fit(func, data, data[:,2])
# print coefficients of the polynomial function, i.e., A, B, C, D, E and F
from string import ascii_uppercase
for i, j in zip(popt, ascii_uppercase):
print(f"{j} = {i:.3f}")
In this case I got:
A = 0.903
B = 0.022
C = 0.325
D = -362.140
E = -52.875
F = 31057.352
The fitted surface is compared to the original data (scatter points):
Upvotes: 0
Views: 2241
Reputation: 676
Are you sure your data is coming from something that is a quadratic surface and doesn't have any noise? This curve_fit
function is basically doing the analog of line of best fit. The line of best fit is when you have some data spread out like a line but not exactly a line and you want to find the line through the data that is the closest to the data. The way this "closeness" is defined is for each data point, find the difference from where the point actually is and where the line predicts, square it, and add this up for all the data points. The line of best fit is the line that minimizes this.
Now if the data is noisy (which it almost always is) then the line of best fit won't go exactly through each of the points but rather it should be close. If you have good reason to think that your data has a linear relationship, then this is fine and the inaccuracies tell you how noisy the data is.
Extending this to your example, you're trying to find the best surface that is quadratic in both x and y to fit your data. If you have reason to believe the process generating this data is quadratic, then the differences you see in the graph is the noise of your data.
However, it may be that your data is really coming from something cubic or higher order. You can try these types of functions but don't go too crazy with it, usually data from a physical process isn't too high order. Going overboard with your function is basically called overfitting. A higher order function will decrease the error on your data, and you can even go to the point where you can predict all of your data "perfectly" (by using a polynomial with degree = to number of data points). However, if you overfit (= too high order), then when you get new data your overfit model will predict it worse than a simpler model would.
Upvotes: 1