Neil C. Obremski
Neil C. Obremski

Reputation: 20244

Random contiguous slice of list in Python based on a single random integer

Using a single random number and a list, how would you return a random slice of that list?

For example, given the list [0,1,2] there are seven possibilities of random contiguous slices:

  1. [ ]
  2. [ 0 ]
  3. [ 0, 1 ]
  4. [ 0, 1, 2 ]
  5. [ 1 ]
  6. [ 1, 2]
  7. [ 2 ]

Rather than getting a random starting index and a random end index, there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.

I need it that way, to ensure these 7 possibilities have equal probability.

Upvotes: 3

Views: 1935

Answers (4)

MvG
MvG

Reputation: 60868

Simply fix one order in which you would sort all possible slices, then work out a way to turn an index in that list of all slices back into the slice endpoints. For example, the order you used could be described by

  • The empty slice is before all other slices
  • Non-empty slices are ordered by their starting point
  • Slices with the same starting point are ordered by their endpoint

So the index 0 should return the empty list. Indices 1 through n should return [0:1] through [0:n]. Indices n+1 through n+(n-1)=2n-1 would be [1:2] through [1:n]; 2n through n+(n-1)+(n-2)=3n-3 would be [2:3] through [2:n] and so on. You see a pattern here: the last index for a given starting point is of the form n+(n-1)+(n-2)+(n-3)+…+(n-k), where k is the starting index of the sequence. That's an arithmetic series, so that sum is (k+1)(2n-k)/2=(2n+(2n-1)k-k²)/2. If you set that term equal to a given index, and solve that for k, you get some formula involving square roots. You could then use the ceiling function to turn that into an integral value for k corresponding to the last index for that starting point. And once you know k, computing the end point is rather easy.

But the quadratic equation in the solution above makes things really ugly. So you might be better off using some other order. Right now I can't think of a way which would avoid such a quadratic term. The order Douglas used in his answer doesn't avoid square roots, but at least his square root is a bit simpler due to the fact that he sorts by end point first. The order in your question and my answer is called lexicographical order, his would be called reverse lexicographical and is often easier to handle since it doesn't depend on n. But since most people think about normal (forward) lexicographical order first, this answer might be more intuitive to many and might even be the required way for some applications.

Here is a bit of Python code which lists all sequence elements in order, and does the conversion from index i to endpoints [k:m] the way I described above:

from math import ceil, sqrt
n = 3
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
    b = 1 - 2*n
    c = 2*(i - n) - 1
    # solve k^2 + b*k + c = 0
    k = int(ceil((- b - sqrt(b*b - 4*c))/2.))
    m = k + i - k*(2*n-k+1)//2
    print("{:3} [{}:{}]".format(i, k, m))

The - 1 term in c doesn't come from the mathematical formula I presented above. It's more like subtracting 0.5 from each value of i. This ensures that even if the result of sqrt is slightly too large, you won't end up with a k which is too large. So that term accounts for numeric imprecision and should make the whole thing pretty robust.

The term k*(2*n-k+1)//2 is the last index belonging to starting point k-1, so i minus that term is the length of the subsequence under consideration.

You can simplify things further. You can perform some computation outside the loop, which might be important if you have to choose random sequences repeatedly. You can divide b by a factor of 2 and then get rid of that factor in a number of other places. The result could look like this:

from math import ceil, sqrt
n = 3
b = n - 0.5
bbc = b*b + 2*n + 1
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
    k = int(ceil(b - sqrt(bbc - 2*i)))
    m = k + i - k*(2*n-k+1)//2
    print("{:3} [{}:{}]".format(i, k, m))

Upvotes: 3

Konrad Talik
Konrad Talik

Reputation: 916

there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.

It is difficult to say what method is best but if you're only interested in binding single random number to your contiguous slice you can use modulo.

Given a list l and a single random nubmer r you can get your contiguous slice like that:

l[r % len(l) : some_sparkling_transformation(r) % len(l)]

where some_sparkling_transformation(r) is essential. It depents on your needs but since I don't see any special requirements in your question it could be for example:

l[r % len(l) : (2 * r) % len(l)]

The most important thing here is that both left and right edges of the slice are correlated to r. This makes a problem to define such contiguous slices that wont follow any observable pattern. Above example (with 2 * r) produces slices that are always empty lists or follow a pattern of [a : 2 * a].

Let's use some intuition. We know that we want to find a good random representation of the number r in a form of contiguous slice. It cames out that we need to find two numbers: a and b that are respectively left and right edges of the slice. Assuming that r is a good random number (we like it in some way) we can say that a = r % len(l) is a good approach.

Let's now try to find b. The best way to generate another nice random number will be to use random number generator (random or numpy) which supports seeding (both of them). Example with random module:

import random
def contiguous_slice(l, r):
    random.seed(r)
    a = int(random.uniform(0, len(l)+1))
    b = int(random.uniform(0, len(l)+1))
    a, b = sorted([a, b])
    return l[a:b]

Good luck and have fun!

Upvotes: 0

Douglas Zare
Douglas Zare

Reputation: 3316

It is a little strange to give the empty list equal weight with the others. It is more natural for the empty list to be given weight 0 or n+1 times the others, if there are n elements on the list. But if you want it to have equal weight, you can do that.

There are n*(n+1)/2 nonempty contiguous sublists. You can specify these by the end point, from 0 to n-1, and the starting point, from 0 to the endpoint.

Generate a random integer x from 0 to n*(n+1)/2.

If x=0, return the empty list. Otherwise, x is unformly distributed from 1 through n(n+1)/2.

Compute e = floor(sqrt(2*x)-1/2). This takes the values 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, etc.

Compute s = (x-1) - e*(e+1)/2. This takes the values 0, 0, 1, 0, 1, 2, 0, 1, 2, 3, ...

Return the interval starting at index s and ending at index e.

(s,e) takes the values (0,0),(0,1),(1,1),(0,2),(1,2),(2,2),...

import random
import math

n=10

x = random.randint(0,n*(n+1)/2)

if (x==0):
    print(range(n)[0:0]) // empty set
    exit()

e = int(math.floor(math.sqrt(2*x)-0.5))
s = int(x-1 - (e*(e+1)/2))

print(range(n)[s:e+1]) // starting at s, ending at e, inclusive

Upvotes: 2

user
user

Reputation: 5696

First create all possible slice indexes.

[0:0], [1:1], etc are equivalent, so we include only one of those.

Finally you pick a random index couple, and apply it.

import random

l = [0, 1, 2]

combination_couples = [(0, 0)]
length = len(l)

# Creates all index couples.
for j in range(1, length+1):
    for i in range(j):
        combination_couples.append((i, j))

print(combination_couples)

rand_tuple = random.sample(combination_couples, 1)[0]
final_slice = l[rand_tuple[0]:rand_tuple[1]]

print(final_slice) 

To ensure we got them all:

for i in combination_couples:
    print(l[i[0]:i[1]])

Alternatively, with some math...

For a length-3 list there are 0 to 3 possible index numbers, that is n=4. You have 2 of them, that is k=2. First index has to be smaller than second, therefor we need to calculate the combinations as described here.

from math import factorial as f    

def total_combinations(n, k=2):
    result = 1

    for i in range(1, k+1):
        result *= n - k + i
    result /= f(k)
    # We add plus 1 since we included [0:0] as well.
    return result + 1

print(total_combinations(n=4))    # Prints 7 as expected.

Upvotes: 1

Related Questions