Reputation: 42439
I have a set of curves F={f1, f2, f3,..., fN}
, each of them defined through a set of points, ie: I don't have the explicit form of the functions. So I have a set of N
tables like so:
#f1: x y
1.2 0.5
0.6 5.6
0.3 1.2
...
#f2: x y
0.3 0.1
1.2 4.1
0.8 2.2
...
#fN: x y
0.7 0.3
0.3 1.1
0.1 0.4
...
I also have a set of observed/measured data points O=[p1, p2, p3,..., pM]
where each point has x, y
coordinates and a given weight between [0, 1]
, so it looks like:
#O: x y w
0.2 1.6 0.5
0.3 0.7 0.3
0.1 0.9 0.8
...
Since N ~ 10000
(I have a big number of functions) what I'm looking for is an efficient (more precisely: fast) way to find the curve that best fits my set of observed and weighted points O
.
I know how to find a best fit with python
when I have the explicit form of the functions (scipy.optimize.curve_fit), but how do I do that when I have the functions defined as tables?
Upvotes: 1
Views: 2359
Reputation: 9963
Here's one possible solution. This combines some of the comments on your original post, and @elyase's solution above. @elyase has provided one way to interpolate between the points you have for each function. Given that, and the definition of best fit being weighted sum of squares, I think the following does what you want:
# Here a model is an interpolated function as per @elyase's solution above
min_score = sys.float_info.max
best_model = None
for model in models:
# data is an array of (x, y, weight) tuples
score = 0.0
for data_point in data:
w = data_point[2]
x = data_point[0]
y = data_point[1]
score += w * (y - model.get_y(x)) ** 2
if score < min_score:
best_model = model
return best_model
You mention that you need a "fast" solution. Based on your answers above, doing the above for each set of data results in about 2 million iterations total. That shouldn't be more than a few seconds, even with Python. Is that fast enough?
If not, things get much more complex. For example, you could try to store your models (you call them functions above) in sorted order such that model1 > model2
if model1(x) > model2(x)
for all x
(given the interpolation thing above). That defines only a partial order, but that might be enough to be very helpful if your models have the right properties. Given that, you can do something akin to binary search. Alternatively, you could do a branch-and-bound thing where the bound is given by the distance between the first value in data and the first value in the function. Depending on the nature of your functions and data that might or might not help. There are solutions you could consider if you required an almost exact, but not necessarily optimal answer, etc., etc., etc. In short, to go beyond the trivial answer above, I think we need to know more about your time constraints, the data, and the models.
Upvotes: 1
Reputation: 11644
Here's my suggested approach:
For example:
import numpy as np
# define an array of N=3 functions
funcs = np.array([
[[0, 1, 2, 3, 4, 5], # x1
[0, 1, 2, 1, 0, 0]], # y1
[[0, 1, 2, 3, 4, 5], # x2
[0, 0, 0, 1, 2, 3]], # y2
[[0, 1, 2, 3, 4, 5], # x3
[5, 4, 3, 2, 1, 0]] # y3
], dtype=float)
# define the test data and weights with the same
# dimensions as function array
data = np.array([
[[0, 1, 2, 3, 4, 5], # x
[0, 1, 2, 2, 1, 0]] # y
], dtype=float)
weight = np.array([
[0.1, 0.2, 0.3, 0, 0, 0] # w
])
# compute distance between points in data and each function:
dist = ((funcs - data) ** 2).sum(axis=1)
# compute weighted error across all functions:
err = (dist * weight).sum(axis=1)
print "Errors:", err
print "Best fit:", np.argmin(err)
Upvotes: 1
Reputation: 40993
You need two elements in order to have a fit, the data(which you already have) and a model space(Linear Models, Gaussian Process, Support Vector Regression). In your case your model has the additional constrain that some data points should be weighted differently than others. May be something like this works from you:
from scipy.interpolate import UnivariateSpline
temp = np.asarray([10, 9.6, 9.3, 9.0, 8.7])
height = np.asarray([129, 145, 167, 190, 213])
f = UnivariateSpline(height, temp)
Now you can evaluate f
wherever you want:
test_points = np.arange(120, 213, 5)
plot(height, temp, 'o', regular_heights, f(test_points), 'x')
Upvotes: 4