Marcel
Marcel

Reputation: 134

Find the index of N in the sequence of numbers with nondecreasing digits (without reconstructing the sequence)

Consider the sequence of all nonnegative integers that have digits in nondecreasing order, i.e., digits sorted from left to right (A009994 in the OEIS). This sequence starts with 0 and includes numbers like 29, 33 and 112, but not 30, 31, 32 or 105.

Now, given a number N with nondecreasing digits, I need to find its position in the sequence. Alternatively, I need to find how many numbers with nondecreasing digits are smaller than N.

This can be done in O(N) time if we simply generate each number in the sequence until we find N. Furthermore, we can pre-compute the sequence up to some maximum number and then use binary search to find the index in O(log N). But since in my program this computation needs to be repeated thousands of times (inside an inner loop) for different values of N, it would be nice if I had a more efficient algorithm or even a closed formula to compute the index.

In case it matters, in my program N is a four-digit number in base 7, i.e., 0 <= N <= 66667.

The maximum index of N in my program is C(7+4-1, 4) - 1 = 10! / 4! / 6! - 1 = 209.

Edit: Since some contributors are emphasizing how good the solutions involving reconstructing the sequence actually are, I want to note that memory usage is important too. I recognize that said solutions aren't terrible, but they certainly seem wasteful. The somewhat similar problem of finding the index of a permutation has a simple solution that doesn't require reconstructing the sequence.

Upvotes: 0

Views: 276

Answers (2)

cdlane
cdlane

Reputation: 41872

But since in my program this computation needs to be repeated thousands of times (inside an inner loop) for different values of N, it would be nice if I had a more efficient algorithm

Let's do a sanity check of what it costs you in time not constructing and indexing the sequence. Let's implement a simple-minded dictionary-based solution:

from numpy import base_repr

table = {}

for number in range(int('6666', 7) + 1):
    representation = base_repr(number, base=7)

    a, b, c, d, *rest = representation + '7' * 3

    if b >= a and c >= b and d >= c:
        table[int(representation)] = len(table)

Timing one million lookups, on my system this is 15 times faster than your 'general solution'. But 2/5 of that faster time was spent generating the table. Let's go a step further, and statically define the table:

table = {
    0: 0,
    1: 1,
    2: 2,
    3: 3,
    4: 4,
    5: 5,
    # ...
    4666: 204,
    5555: 205,
    5556: 206,
    5566: 207,
    5666: 208,
    6666: 209,
}

Doing a million lookups, on my system this is 25 times faster than your 'general solution'.

We've still a way to go towards finding a closed-formula solution that is also an efficient algorithm.

Upvotes: 0

Marcel
Marcel

Reputation: 134

I've found a closed-formula solution expanding on the notion that A009994 is the sequence of combinations with repetitions in lexicographical order.

The solution is summarize by the following Python code:

>>> from math import factorial as fact
>>> C = lambda n, k: n>=k and fact(n) // fact(k) // fact(n-k) or 0
>>> I = lambda a, b, c, d: C(10, 4) - C(9-a, 4) - C(8-b, 3) - C(7-c, 2) - C(6-d, 1) - 1
>>> I(0, 0, 0, 0)
0
>>> I(2, 2, 3, 5)
147
>>> I(6, 6, 6, 6)
209

Explanation

Let's consider the sequence S of 3-digit (K=3) numbers with digits sorted from left to right in base 10 (B=10)

Any number N in S can be uniquely identified by the count of each digit occurring in it, e.g. if we have 1 occurrence of digit "2" and 2 occurrences of digit "3", then N must be "233".

Knowing all digits available, we can use the stars & bars notation to represent N. In base 10 we have 10 digits available so we need B-1=9 bars to separate the slots:

|||||||||

For N = 233, putting the stars in their slots we have:

||*|**||||||

Now we have a list of B-1+K=12 objects (9 bars + 3 stars). The positions of the stars in the list (starting from 0) are: (2, 4, 5).

In short, to convert any N to its stars & bars 3-tuple representation, we can just add each digit in N to its position in N itself:

233 -> (2+0, 3+1, 3+2) -> (2, 4, 5)   (formula I)

Again, this 3-tuple uniquely identifies any 3-digit number in base 10 with digits sorted. Conversely, in order to generate any 3-digit number with digits sorted, we could just pick three distinct items from the set of 12 positions {0, 1, 2, ..., 10, 11}. There's a one-to-one correspondence between S and the set T of all possible 3-tuples with distinct numbers from 0 to 11.

The advantage of this representation is that repetition cannot occur since two stars cannot be in the same position. This means that T is simply the set of 3-combinations of 12 items.

It's also easy to see that sorting T lexicographically is equivalent to sorting S numerically.

From this point, we can use the textbook formulas for combinations. This is what I used to find the length of S, as mentioned in the question. What I didn't know is that the formula to find the position of a combination in the list of all possible combinations sorted by ascending order is:

I = C(n, k) - C(n-p1, k) - C(n-p2, k-1) - ... - C(n-pk, 1)  (formula II)

where:

  • C is the binomial coefficient: C(n, k) = n! / (k! * (n-k)!)
  • n is the total number of items available (in our case n=B-1+K=12)
  • k is the number of items in each combination (k=K=3)
  • p1, p2, ..., pk are the items selected (i.e. the items of our 3-tuple)

Finally, plugging formula I inside formula II, we have:

I = C(n, k) - C(n-d1, k) - C(n-d2+1, k-1) - ... - C(n-dk+k-1, 1)

where d1, d2, ..., dk are the digits of N

Caution: when N has the highest digit B-1=9, this formula will include C(n, k) where n < k, which, by the definition above, includes the factorial of a negative number, which is not defined. For this reason, we must substitute these terms with zero, since in this case the set of k-combinations of n elements is empty.

And this is the general solution for the question.

Upvotes: 1

Related Questions