MaiaVictor
MaiaVictor

Reputation: 53037

How to generate a function that will algebraically encode a sequence?

Is there any way to generate a function F that, given a sequence, such as:

seq = [1 2 4 3 0 5 4 2 6]

Then F(seq) will return a function that generates that sequence? That is,

F(seq)(0) = 1
F(seq)(1) = 2
F(seq)(2) = 4
... and so on

Also, if it is, what is the function of lowest complexity that does so, and what is the complexity of the generated functions?

EDIT It seems like I'm not clear, so I'll try to exemplify:

F(seq([1 3 5 7 9])} 
# returns something like:
F(x) = 1 + 2*x
# limited to the domain x ∈ [1 2 3 4 5]

In other words, I want to compute a function that can be used to algebraically, using mathematical functions such as +, *, etc, restore a sequence of integers, even if you cleaned it from memory. I don't know if it is possible, but, as one could easily code an approximation for such function for trivial cases, I'm wondering how far it goes and if there is some actual research concerning that.

EDIT 2 Answering another question, I'm only interested in sequences of integers - if that is important.

Please let me know if it is still not clear!

Upvotes: 0

Views: 164

Answers (3)

Joni
Joni

Reputation: 111349

If the sequence comes from a polynomial with a low degree, an easy way to find the unique polynomial that generates it is using Newton's series. Constructing the polynomial for a n numbers has O(n²) time complexity, and evaluating it has O(n).

In Newton's series the polynomial is expressed in terms of x, x(x-1), x(x-1)(x-2) etc instead of the more familiar x, x², x³. To get the coefficients, basically you compute the differences between subsequent items in the sequence, then the differences between the differences, until only one is left or you get a sequence of all zeros. The numbers you get along the bottom, divided by factorial of the degree of the term, give you the coefficients. For example with the first sequence you get these differences:

1  2  4  3  0  5   4   2    6
   1  2 -1 -3  5  -1  -2    4 
      1 -3 -2  8  -6  -1    6
        -4  1 10 -14   5    7
            5  9 -24  19    2
               4 -33  43  -17
                 -37  76  -60
                     113 -136
                         -249

The polynomial that generates this sequence is therefore:

f(x) = 1 + x(1 + (x-1)(1/2 + (x-2)(-4/6 + (x-3)(5/24 + (x-4)(4/120 
         + (x-5)(-37/720 + (x-6)(113/5040 + (x-7)(-249/40320))))))))

It's the same polynomial you get using other techniques, like Lagrange interpolation; this is just the easiest way to generate it as you get the coefficients for a polynomial form that can be evaluated with Horner's method, unlike the Lagrange form for example.

Upvotes: 3

dreamzor
dreamzor

Reputation: 5925

There is no magic if you say that the sequence could be completely random. And yet, it is always possible, but won't save you memory. Any interpolation method requires the same amount of memory in the worst case. Because, if it didn't, it would be possible to compress everything to a single bit.

On the other hand, it is sometimes possible to use a brute force, some heuristics (like genetic algorithms), or numerical methods to reproduce some kind of mathematical expression having a specified type, but good luck with that :)

Just use some archiving tools instead in order to save memory usage.

I think it will be useful for you to read about this: http://en.wikipedia.org/wiki/Entropy_(information_theory)

Upvotes: 2

Traklon
Traklon

Reputation: 301

Well, if you just want to know a function with "+ and *", that is to say, a polynomial, you can go and check Wikipedia for Lagrange Polynomial (https://en.wikipedia.org/wiki/Lagrange_polynomial). It gives you the lowest degree polynomial that encodes your sequence.

Unfortenately, you probably won't be able to store less than before, as the probability of the polynom being of degree d=n-1 where n is the size of the array is very high with random integers. Furthermore, you will have to store rational numbers instead of integers. And finally, the access to any number of the array will be in O(d) (using Horner algorithm for polynomial evaluation), in comparison to O(1) with the array.

Nevertheless, if you know that your sequences may be very simple and very long, it might be an option.

Upvotes: 3

Related Questions