Kalev Maricq
Kalev Maricq

Reputation: 617

Most Efficient Sorting Algorithm for Generated Data

I have the following formula: A=(x+x0)^.5 * (y+y0) * (z+z0)^.5

x0, y0, and z0 are constant for a given run, but may change between runs of the program. x, y, and z are randomly generated for an item and are uniform integers in [0, 15]. This means there are 16^3=4096 possible combinations.

I am trying to find the most efficient way to get the percentile of a given A value (x0, y0, and z0 will be given as well). I have two questions:

  1. Is there a way to create an analytic formula that will solve for percentile directly, without generating all possible As and sorting them?
  2. If not, what is the most efficient way to sort this data, given that I have some information about how it will be structured?

I kind of assumed the answer to #1 is "no" but will be pleasantly surprised if someone can come up with an analytic solution. Proceeding with #2, here is my current progress:

Data will be generated via 3 nested loops:

For x = 0 to 15
   For y = 0 to 15
       For z = 0 to 15
          array(n) = A(x,y,z)
          n=n+1
       Next z
   Next y
Next x

We know (at least) 3 things about this data:

  1. array(0) < array(1) < array(2)...
  2. array(0) < array(16) < array(32) ...
  3. array(0) < array(256) < array(512)...

So far my best working algorithm is a mergesort that starts with list size 16. However this ignored 2) and 3) above.

Note: My question is about efficiency. I have a solution, that is slow, but works, so what I'm looking for is the most efficient way to do this.

EDIT: Here is a solution I started to come up with, which feels like it would be the most efficient, but it doesn't work. I'm not sure if it can be salvaged.

Put your values in a 3-dimensional array (x, y, z). Start with (0,0,0) which must be the minimum. The next value must be (1,0,0), (0,1,0), or (0,0,1). Test and add. Let's say it was (1,0,0). Then the next value must be (2,0,0), (0,1,0), or (0,0,1). Continue until you've added all the values in O(n) time.

FLAW: The number of possibilities isn't always constrained to 3. I can't figure out a way to tell the computer which cells are possibilities without killing the efficiency gain. There may be a way, but I just haven't thought of it.

Edit 2: I am still interested in the most efficient sorting algorithm for values generated from a monotonic function, since it is theoretically an interesting question. However, since I asked first if there was a shortcut to getting percentile, I have select the strikingly simple "count the number less than A" as the answer.

Upvotes: 3

Views: 146

Answers (2)

rici
rici

Reputation: 241781

If all you need to know is the position of A in the sorted list of possibilities, there is actually no need to sort the possibilities (O(n log n)). It's sufficient to count the number of possibilities less than or equal to A (O(n)).

In this case, where the function is monotonic, you can reduce the work even further: given some definite values x' and z', you can solve for y' in A = f(x', y', z'). Then you know that there are max(0, min(16, floor(y') + 1)) triples <x', y, z'> whose value is less than or equal to A.

That solution is quite simple. Given

A=(y' + y0) * ((x'+x0) * (z'+z0))^.5

we have

y' = A / ((x'+x0) * (z'+z0))^.5 - y0

Python (which could be considered pseudocode):

def gmean(x, y):
    return (x * y) ** 0.5

def count_le(A, x0, y0, z0):
    count = 0
    for x in range(16):
        for z in range(16):
            gm = gmean(x + x0, z + z0)
            if gm == 0:
                count += 16
            else:
                y = A / gm - y0
                if y >= 0:
                    count += min(16, 1 + int(y))
    return count

To turn the result of count_le into a percentile, you'd have to multiply it by 100/4096.

Upvotes: 2

wookie919
wookie919

Reputation: 3134

Interesting problem!

Here's one idea, which may or may not be the most efficient.

Initialize a min-heap with A(0, 0, 0)
numItems = 0
While True:
    A(x, y, z) = pop minimum from heap
    numItems = numItems + 1
    If A(x, y, z) matches given A value:
        break
    else:
        Add to heap A(x + 1, y, z)
        Add to heap A(x, y + 1, z)
        Add to heap A(x, y, z + 1)

Note that you need to maintain a set of flags to ensure that no duplicates are added to the heap. This can be done in O(1) time e.g. Flags[x][y][z] = True when A(x,y,z) is added to the heap. Also another minor note to perform some boundary checks when adding to the heap.

Pop minimum takes O(logn) time. Adding to the heap takes O(logn) time. Thus the worst case time complexity is still O(nlogn).

The advantages are:

  • You can stop as soon as the given A value is found. i.e. you don't need to calculate all possible A values, and you certainly don't need to sort them all.
  • If the given A value is large, you can work with a max-heap.

Upvotes: 1

Related Questions