How to estimate mathematical function by input and output values?

Question

I have a number of some N imputes and corresponding outputs of a function for these input values.

What I'd like to is estimate the mathematical function for given outputs and inputs. Is it closer to the form of f(N), f(N^2), log(N), N*lolg(N) or 2^N.

Essentially, what I'd like to do is estimate big O. Thus, n is amount of input data, and the outputs is compute time. So basically I'd like to at least know to which of above mentioned functioned mine function is closer.

Saeid · Accepted Answer

You can use the Least Squares method to find the function that has the least distance to your data.

Assuming you have a list of sample observations of your unknown function in the shape of some order pairs (x,y) or (x,f(x)). You can measure the distance of this unknown function to some known function g using Least Squares method.

distance = 0
for x,y in sample pairs
    distance += ( y - g(x) )^2

As much as this distance gets smaller your unknown function is closer to the known function g.

Now If you want to find the closest function (from a pre-determined list of functions) to your unknown function, you just have to calculate each of those functions distance to your unknown function. Whichever had the least distance is more similar to your unknown function.

Note that this method is an approximation and it isn't 100% accurate but you can increase its accuracy by providing a larger and more comprehensive sample data

Here is a sample Python implementation:

import math

functions = []

functions.append( lambda n: n )                # y = n
functions.append( lambda n: n*n )              # y = n^2
functions.append( lambda n: math.log(n,2) )    # y = log(n)
functions.append( lambda n: n*math.log(n,2) )  # y = n*log(n)
functions.append( lambda n: 2**n )             # y = 2^n

# creating the sample data the unknown function is n + 10*n*log(n)
pairs = [ (n, n + 10*n*math.log(n,2) ) for n in range(1,200,1) ]

# calculating the distance of each function to the sample data
# and add them to a list
distances = [ ]
for func in functions:
    d = 0
    for n,y in pairs:
        d += (func(n) - y)**2 
    distances.append(d)

# finding the minimum value and printing the index   
print distances.index(min(distances))

Output

It means the 4th function is closest to our sample data which is n*log(n).

Note that if we reduce the size of the sample data like this (getting rid of half the sample data):

pairs = [ (n, n + 10*n*math.log(n,2) ) for n in range(1,100,1) ]

This program will print 1 which means that the closest function is n². This shows the importance of the sample data.

How to estimate mathematical function by input and output values?

Answers (2)

Related Questions