Reputation: 31
I have a number of some N
imputes and corresponding outputs of a function for these input values.
What I'd like to is estimate the mathematical function for given outputs and inputs. Is it closer to the form of f(N)
, f(N^2)
, log(N)
, N*lolg(N)
or 2^N
.
Essentially, what I'd like to do is estimate big O. Thus, n is amount of input data, and the outputs is compute time. So basically I'd like to at least know to which of above mentioned functioned mine function is closer.
Upvotes: 1
Views: 945
Reputation: 32597
Here is a small addendum to sudomakeinstall2's answer.
In the big O notation, you do not care about a constant scaling. Hence, instead of measuring the distance ( y - g(x) )^2
as in his answer, you actually want to measure ( y - k * g(x) )^2
, where k
is the constant scaling that fits best. This k
can be calculated directly as the least squares fit. Here is the modified version, which should be more robust:
...
for func in functions:
#calculate the best k
numerator = 0
denominator = 0
for n,y in pairs:
numerator += func(n) * y
denominator += func(n) * func(n)
k = numerator / denominator
d = 0
for n,y in pairs:
d += (k * func(n) - y)**2
distances.append(d)
...
Upvotes: 2
Reputation: 4265
You can use the Least Squares method to find the function that has the least distance to your data.
Assuming you have a list of sample observations of your unknown function in the shape of some order pairs (x,y)
or (x,f(x))
. You can measure the distance of this unknown function to some known function g using Least Squares method.
distance = 0
for x,y in sample pairs
distance += ( y - g(x) )^2
As much as this distance gets smaller your unknown function is closer to the known function g.
Now If you want to find the closest function (from a pre-determined list of functions) to your unknown function, you just have to calculate each of those functions distance to your unknown function. Whichever had the least distance is more similar to your unknown function.
Note that this method is an approximation and it isn't 100% accurate but you can increase its accuracy by providing a larger and more comprehensive sample data
Here is a sample Python implementation:
import math
functions = []
functions.append( lambda n: n ) # y = n
functions.append( lambda n: n*n ) # y = n^2
functions.append( lambda n: math.log(n,2) ) # y = log(n)
functions.append( lambda n: n*math.log(n,2) ) # y = n*log(n)
functions.append( lambda n: 2**n ) # y = 2^n
# creating the sample data the unknown function is n + 10*n*log(n)
pairs = [ (n, n + 10*n*math.log(n,2) ) for n in range(1,200,1) ]
# calculating the distance of each function to the sample data
# and add them to a list
distances = [ ]
for func in functions:
d = 0
for n,y in pairs:
d += (func(n) - y)**2
distances.append(d)
# finding the minimum value and printing the index
print distances.index(min(distances))
Output
3
It means the 4th function is closest to our sample data which is n*log(n)
.
Note that if we reduce the size of the sample data like this (getting rid of half the sample data):
pairs = [ (n, n + 10*n*math.log(n,2) ) for n in range(1,100,1) ]
This program will print 1
which means that the closest function is n2. This shows the importance of the sample data.
Upvotes: 2