Vermin
Vermin

Reputation: 13

Problems with repeated draws of Sobol sequence in scipy

How do I get unique (uncorrelated) Sobol sequences by making repeated calls to random or random_base2?

I'm trying to repeatedly generate 'random' Sobol sequences, but I'm getting the same sequence each time. My reading of the documentation suggested that repeated calls would give me a different sequence ("To continue an existing design, extra points can be obtained by calling again random_base2.").

Here's my code:

import scipy
engine2 = scipy.stats.qmc.Sobol(d=1, scramble=True)
sample_qmc1 = engine2.random_base2(m=14)
print(sample_qmc1)
sample_qmc2 = engine2.random_base2(m=14)
print(sample_qmc2)

Here's example output:

[[0.2534843 ]
 [0.884218  ]
 [0.69113704]
 ...
 [0.69106254]
 [0.88420477]
 [0.25347106]]
[[0.25342032]
 [0.88416122]
 [0.69107275]
 ...
 [0.69111674]
 [0.88426601]
 [0.2535251 ]]

As you can see, the values are almost the same.

reset() doesn't help, I get the same sequence. Changing to random() doesn't help either. Using smaller sample sizes does seem to work.

If I add this line:

sample_qmc3 = engine2.random_base2(m=14)

I get this error:

ValueError: The balance properties of Sobol' points require n to be a power of 2. 24576 points have been previously generated, then: n=24576+2**14=40960. If you still want to do this, the function 'Sobol.random()' can be used.

I tried LatinHypercube and it doesn't have this problem.

I'm using scipy 1.14.1

Anyone go any ideas how I can generate unique Sobol sequences?

Upvotes: 0

Views: 76

Answers (2)

Vermin
Vermin

Reputation: 13

Here's how I solved the problem and an explanation of what I think is going on.

I needed x sequences, so I did (for x=2):

    engine = scipy.stats.qmc.Sobol(d=2,
                                   scramble=True
                                   )
    random_x = engine.random(n=size).T.reshape(2, size)

This gives me this kind of output:

array([[0.30097613, 0.90872708, 0.57342084, ..., 0.57336791, 0.90871991,
        0.30101476],
       [0.73375382, 0.18831677, 0.93608302, ..., 0.26994206, 0.60370303,
        0.06809267]])

The two sequences are uncorrelated.

Here's what I think is happening. For every sequence to be uncorrelated with the other sequences, they have to have 'knowledge' of each other, this is hinted at in the documentation. So when you draw one sequence at a time, there's no knowledge of the prior sequences. When you draw them all at once (as dimensions) they have knowledge of each other, so they're uncorrelated.

Upvotes: 0

Matt Haberland
Matt Haberland

Reputation: 3873

How do I get unique Sobol sequences by making repeated calls to random or random_base2?

Those sequences are unique, but they are not independent. Sobol is using a specific strategy to fill in the space with low discrepancy, and it produces the points in a particular order. After you produce the first 2**14 = 16384 points, the space is already filled pretty finely, so you're trying to fit new points into gaps that are on average ~1/2**14. Each point is bound to be close to at least one previously generated point, and because Sobol' is producing the (slightly different) points in "the same order" with each call, you immediately see which old point each new point is close to. If you want the sequences to look different, you can randomly permute them without destroying the low-discrepancy properties of the sequence.

If you actually want "independent" (for most practical purposes) sequences, use two Sobol engines, each with a different random seed (seed argument in SciPy <1.15; rng preferred in SciPy >= 1.15).

I tried LatinHypercube and it doesn't have this problem.

It works differently. IIRC the LatinHypercube in 1D is basically just np.linspace with some random noise, but it also permutes the sequence so the order does not look the same between calls.

Upvotes: 0

Related Questions