Reputation: 1809

Developing an indexing scheme for list of all non-decreasing sequences

Suppose that we are considering a sorted list of all non-decreasing sequences with values in the range (1, max_num) and num_slots elements in each sequence, how can I find the index of some given member sequence in O(1) time complexity? I am not actually given the entire list up-front, I just want to find the index of some member sequence were the list of all sequences to exist.

For a concrete example, suppose that max_num = 3 and num_slots = 4. Then there are 15 sequences (or generally, there are (max_num + num_slots - 1) choose (num_slots) sequences):

[[1, 1, 1, 1],
 [1, 1, 1, 2],
 [1, 1, 1, 3],
 [1, 1, 2, 2],
 [1, 1, 2, 3],
 [1, 1, 3, 3],
 [1, 2, 2, 2],
 [1, 2, 2, 3],
 [1, 2, 3, 3],
 [1, 3, 3, 3],
 [2, 2, 2, 2],
 [2, 2, 2, 3],
 [2, 2, 3, 3],
 [2, 3, 3, 3],
 [3, 3, 3, 3]]

So given inputs of a sequence like [1, 2, 2, 3] along with the information max_num = 3, I am trying to write a function that would return its correct index of 7. I do not actually have the list of all sequences to work with.

Background info

I have come up with an algorithm to generate all non-decreasing sequences I care about, but this doesn't seem completely relevant for generating the index of a particular member sequence without the whole list of sequences materialized.

def gen(max_num, num_slots, l = None): 
    if l is None: 
        l = [[1] * num_slots] 
    cur = l[-1].copy() 
    for i in reversed(range(num_slots)): 
        if cur[i] < max_num: 
            cur[i] += 1 
            for j in range(i+1, num_slots): 
                cur[j] = cur[i] 
            l.append(cur) 
            return gen(max_num, num_slots, l) 
    return l

Upvotes: 4

Answers (4)

גלעד ברקן

Reputation: 23945

There is bijection from the k-subsets of {1...n} (with repetition) to k-subsets of {1...n + k − 1} (without repetition) by mapping {c_0, c_1...c_(k−1)} to {c_0, c_(1+1), c_(2+2)...c_(k−1+k−1)} (see here).

Once converted, just use your favourite combination ranking utility.

[3, 3, 3, 3]  -->  [3, 4, 5, 6]
[2, 3, 3, 3]  -->  [2, 4, 5, 6]
[2, 2, 3, 3]  -->  [2, 3, 5, 6]
[2, 2, 2, 3]  -->  [2, 3, 4, 6]
[2, 2, 2, 2]  -->  [2, 3, 4, 5]
[1, 3, 3, 3]  -->  [1, 4, 5, 6]
[1, 2, 3, 3]  -->  [1, 3, 5, 6]
[1, 2, 2, 3]  -->  [1, 3, 4, 6]
[1, 2, 2, 2]  -->  [1, 3, 4, 5]
[1, 1, 3, 3]  -->  [1, 2, 5, 6]
[1, 1, 2, 3]  -->  [1, 2, 4, 6]
[1, 1, 2, 2]  -->  [1, 2, 4, 5]
[1, 1, 1, 3]  -->  [1, 2, 3, 6]
[1, 1, 1, 2]  -->  [1, 2, 3, 5]
[1, 1, 1, 1]  -->  [1, 2, 3, 4]

import pyncomb

def convert(m, S):
  return (m + len(S) - 1, [ x-1 + i for x,i in zip(S, list(xrange(len(S)))) ])

def rank(m, S):
  k, s = convert(m, S)
  return pyncomb.ksubsetcolex.rank(k, s)

print rank(3, [1,2,2,3])
# 7

Upvotes: 1

Rocky Li

Reputation: 5958

I'll elaborate on @DavidFrank's answer on why it is O(length+max_num) and give a more easily understand example (a bit more complex too).

To start with, observe:

Assume the total series possibility in F(length, max_num) = X

Then for all of possibilities in X that starts with 1, e.g. [1, ....], we have a count of F(length-1, max_num) within this group.

For all the possibility in X that does not start with 1, e.g. [2, ....] or [3, ....], we have a count of F(length, max_num-1).

Thus we can use recursion to get this in O(length*max_num) (can become O(length+max_num) if we use memoization) number of complexity:

# This calculate the total number of X of possible entry given (length, max_num)
def calc_sum(length, max_num):
    if max_num == 1:
        return 1
    elif length == 1:
        return max_num
    else:
        total = calc_sum(length-1, max_num) + calc_sum(length, max_num-1)
        return total

Now we examine the result to see if we can make it O(1):

# This is clearly not going to make it O(1), so now we need some generalizations to NOT run this recursion.
import numpy as np
arr = np.zeros((6,6))
for i in range(6):
    for j in range(6):
        arr[i, j] = calc_sum(i+1, j+1)
print(arr)

The result is:

[[  1.   2.   3.   4.   5.   6.]
 [  1.   3.   6.  10.  15.  21.]
 [  1.   4.  10.  20.  35.  56.]
 [  1.   5.  15.  35.  70. 126.]
 [  1.   6.  21.  56. 126. 252.]
 [  1.   7.  28.  84. 210. 462.]]

This is a pascal's triangle, if you look diagonally to the top right. The diagonals of pascal's triangle are defined by (x choose y)

This makes it clear that it cannot be O(1), and will at least be O(length+max_num) because this is the general complexity of (Choose) function.

We've went all the way to prove that an O(1) solution is impossible, unless we constrain (length + max_num) to be constant.

# We can expand by solving it now:
from scipy.special import comb # this is choose function.

def get_index(my_list, max_num):
    my_list = np.array(my_list)
    if len(my_list) == 1:
        return my_list[0] - 1
    elif my_list[0] == 1:
        return get_index(my_list[1:], max_num)
    elif my_list[0] != 1:
        return get_index(my_list - 1, max_num - 1) + comb(len(my_list)-2+max_num, max_num-1)

get_index([1,2,2,3],3) # 7

The aggregated complexity of the final function with the comb() is still O(length + max_num) as the complexity of everything outside comb is O(length + max_num) as well.

Upvotes: 1

David Frank

Reputation: 6092

This one is O(|seq| + max_num). Note that this is still much faster than the naive generate all and search approach, which is exponential in |seq|.

The idea is that you count the sequences before the input sequence. For example, you want to know what's the index of [2, 4, 5, 6] when max_num = 6.

Count [1, *, *, *]
Count [2, 2, *, *]
Count [2, 3, *, *]
(Note: you cannot count [2, 4, *, *], because then you would include [2, 4, 6, 6] which comes after your input. You should always go until one less than your input at the given index)
Count [2, 4, 4, *]
Count [2, 4, 5, 5]

(for each row, you can use your formula, (max_num + num_slots - 1) choose (num_slots) and sum them up)

def combinations(slots, available):
    return choose(slots + available - 1, slots)

def find_index(seq, max_num):
    res = 0
    for digit_index in xrange(len(seq)):
        prev = seq[digit_index - 1] if digit_index > 0 else 1
        for digit in xrange(prev, seq[digit_index]):
            res += combinations(len(seq) - digit_index - 1, max_num - digit + 1)
    return res


print find_index([1, 2, 2, 3], 3)

Upvotes: 4

AShelly

Reputation: 35580

For each digit, find the difference between that and the lowest digit. Add 1 for each changed position to the right of any altered digit

idx = 0;
for i in range(0,num_slots):
    d = SEQ[i]
    idx += d-min_num
    if (d > min_num):
        idx += num_slots-1 - i

For example:
[1,1,1,3] is 0 + 0 + 0 + (2+0) or 2
[1,2,3,3] is 0 + (1+2) + (2+1) + (2+0) or 8
[3,3,3,3] is (2+3) + (2+2) + (2+1) + (2+0) or 14

Upvotes: 0

Developing an indexing scheme for list of all non-decreasing sequences

Answers (4)

Related Questions