The Wind-Up Bird
The Wind-Up Bird

Reputation: 661

"Deterministic" pseudorandom number generation

I am in a situation where I need to generate a very large (~10^16 elements) random matrix with a certain random sparsity pattern. Obviously storing all of these elements is completely infeasible. Only a few of the elements are needed at any given time, so they can be drawn on-demand - however, once an element is stored, it may be needed later, and it's important that the same value is used. i.e, elements cannot be thrown out and randomly redrawn - once a random element is drawn, it needs to be saved.

Based on the problem itself, there are potentially some clever ways to handle this that I won't get into. However, a colleague says that it should be possible to deterministically generate these random numbers as needed by using a pseudorandom number generator with seed given by the index in the matrix, i.e. use i + N*j for element (i, j) of the matrix where the matrix is size N*N. This would not be calling the rand() function, but using the underlying pseudorandom function with specific arguments to deterministically generate previously drawn random values. Then no numbers would need to be saved, and they could be deterministically redrawn on demand.

My understanding of PRG's was that for a sequence of numbers appear random, you must fix the seed. Does the above approach make sense? It seems to me to be like repeatedly re-seeding the PRG and just taking the first element.

Upvotes: 2

Views: 1583

Answers (2)

B. M.
B. M.

Reputation: 18628

Not a precise answer, but some tryes.

A hash function seems to be a simple and efficient mean to achieve your goal.

Here are some good ideas about integer to integer hash-functions.

From this article I try that :

from numba import uint64, njit 
import pylab as pl

@njit(uint64(uint64,uint64))    
def hash64(i,j) :
    x= i + (j << 32)
    x = (x ^ (x >> 30)) * (0xbf58476d1ce4e5b9);
    x = (x ^ (x >> 27)) * (0x94d049bb133111eb);
    x = x ^ (x >> 31);
    return x;  

n=1000    
im=[[hash64(i,j) for i in range(n)] for j in range(n)]
pl.subplot(121)
pl.imshow(im)
pl.colorbar()
pl.subplot(122)
pl.hist(np.array(im).ravel(),bins=100)
pl.show()  

This numba hash64 function compute a hash code in ~ 200 ns.

And this plot (even it demonstrates nothing) shows that this function could be a good candidate.

enter image description here

By contrast the python hash function (hash((i,j)) on tuple) do not pass the test : enter image description here

Here for A Knuth Generator :

enter image description here

And some benchmarks :

In [61]: %timeit hash64(0,1)
215 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [62]: %timeit np.random.seed(0+1<<30);a=np.random.randint(2**30)
4.18 µs ± 126 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [63]:%timeit  hash((0,1))
102 ns ± 19.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) 

Upvotes: 1

Severin Pappadeux
Severin Pappadeux

Reputation: 20080

Basically, you want 1016 ~ 253 random numbers which uniquely determined by matrix element linear index.

Simplest random number generator would be 64bit Linear Congruential one. Basically any reasonable one with 63-64bits length and full period will work. Code below implements Knuth LCG with seed equal to matrix element linear index. Another good property of the LCG is jump-ahead with logarithmic complexity - jump ~ log2(N)

Code

import numpy as np

A = np.uint64(6364136223846793005) # multiplier
C = np.uint64(1442695040888963407) # increment
K = np.float64(5.421010862427522170037E-20) # 1/2^64

def K64(seed):
    """
    Knuth 64bit MMIX LCG, return uint64 in the range [0...2^64)
    """
    return A*np.uint64(seed) + C

def KLCG64(seed):
    """
    Knuth 64bit MMIX LCG, return float64 in the range [0.0...1.0)
    """
    #return np.float64(K64(seed))*K # here is proper implementation, but slower than inlined code below
    return np.float64(A*np.uint64(seed) + C)*K

def randomMat(i, j, N):
    """
    return random element of the matrix
    """
    return KLCG64(np.uint64(i) + np.uint64(j)*np.uint64(N))

np.seterr(over='ignore') # important! LCG relies on unsigned overflow for mod operation

print(randomMat(123738, 9276, 735111))

Upvotes: 0

Related Questions