Most Efficient Sorting Algorithm for Generated Data

Question

I have the following formula: A=(x+x0)^.5 * (y+y0) * (z+z0)^.5

x0, y0, and z0 are constant for a given run, but may change between runs of the program. x, y, and z are randomly generated for an item and are uniform integers in [0, 15]. This means there are 16^3=4096 possible combinations.

I am trying to find the most efficient way to get the percentile of a given A value (x0, y0, and z0 will be given as well). I have two questions:

Is there a way to create an analytic formula that will solve for percentile directly, without generating all possible As and sorting them?
If not, what is the most efficient way to sort this data, given that I have some information about how it will be structured?

I kind of assumed the answer to #1 is "no" but will be pleasantly surprised if someone can come up with an analytic solution. Proceeding with #2, here is my current progress:

Data will be generated via 3 nested loops:

For x = 0 to 15
   For y = 0 to 15
       For z = 0 to 15
          array(n) = A(x,y,z)
          n=n+1
       Next z
   Next y
Next x

We know (at least) 3 things about this data:

array(0) < array(1) < array(2)...
array(0) < array(16) < array(32) ...
array(0) < array(256) < array(512)...

So far my best working algorithm is a mergesort that starts with list size 16. However this ignored 2) and 3) above.

Note: My question is about efficiency. I have a solution, that is slow, but works, so what I'm looking for is the most efficient way to do this.

EDIT: Here is a solution I started to come up with, which feels like it would be the most efficient, but it doesn't work. I'm not sure if it can be salvaged.

Put your values in a 3-dimensional array (x, y, z). Start with (0,0,0) which must be the minimum. The next value must be (1,0,0), (0,1,0), or (0,0,1). Test and add. Let's say it was (1,0,0). Then the next value must be (2,0,0), (0,1,0), or (0,0,1). Continue until you've added all the values in O(n) time.

FLAW: The number of possibilities isn't always constrained to 3. I can't figure out a way to tell the computer which cells are possibilities without killing the efficiency gain. There may be a way, but I just haven't thought of it.

Edit 2: I am still interested in the most efficient sorting algorithm for values generated from a monotonic function, since it is theoretically an interesting question. However, since I asked first if there was a shortcut to getting percentile, I have select the strikingly simple "count the number less than A" as the answer.

rici · Accepted Answer

If all you need to know is the position of A in the sorted list of possibilities, there is actually no need to sort the possibilities (O(n log n)). It's sufficient to count the number of possibilities less than or equal to A (O(n)).

In this case, where the function is monotonic, you can reduce the work even further: given some definite values x' and z', you can solve for y' in A = f(x', y', z'). Then you know that there are max(0, min(16, floor(y') + 1)) triples <x', y, z'> whose value is less than or equal to A.

That solution is quite simple. Given

A=(y' + y0) * ((x'+x0) * (z'+z0))^.5

we have

y' = A / ((x'+x0) * (z'+z0))^.5 - y0

Python (which could be considered pseudocode):

def gmean(x, y):
    return (x * y) ** 0.5

def count_le(A, x0, y0, z0):
    count = 0
    for x in range(16):
        for z in range(16):
            gm = gmean(x + x0, z + z0)
            if gm == 0:
                count += 16
            else:
                y = A / gm - y0
                if y >= 0:
                    count += min(16, 1 + int(y))
    return count

To turn the result of count_le into a percentile, you'd have to multiply it by 100/4096.

Most Efficient Sorting Algorithm for Generated Data

Answers (2)

Related Questions