Kodiologist
Kodiologist

Reputation: 3470

Set R's random seed with a hash

R provides a function set.seed to seed the RNG with an integer. The standard package digest can hash objects with a variety of hash algorithms, and can output an ASCII representation of the hash or a vector of raw bytes, but can't produce an integer. How can I use the hash of an arbitrary object to seed the RNG?

Upvotes: 1

Views: 866

Answers (2)

Kodiologist
Kodiologist

Reputation: 3470

It looks like set.seed is the only interface to seeding R's RNG. And R integers are always 32 bits, even on 64-bit machines. So we need to use a 32-bit hash. digest provides several 32-bit hashes, but raw = TRUE is ignored for all of them, so we need to do some string operations on a hex representation of the hash. Putting it all together:

set.seed.obj = function(x)
   {x = as.raw(as.hexmode(substring(
        digest::digest(x, algo = "xxhash32"),
        c(1, 3, 5, 7),
        c(2, 4, 6, 8))))
    x = rawConnection(x)
    set.seed(readBin(x, "int"))
    close(x)}

set.seed.obj("hello world")
print(rnorm(3))
set.seed.obj("goodbye world")
print(rnorm(3))
set.seed.obj("hello world")
print(rnorm(3))

Surprisingly, the first assignment to x is necessary: calling rawConnection on the as.raw(...) expression directly results in Error in rawConnection…: invalid 'description' argument. Evidently R crashes while trying to make a string representation of the argument for the connection's description attribute.

Upvotes: 1

Dirk is no longer here
Dirk is no longer here

Reputation: 368181

The interface to set.seed() is indeed given, and only takes an integer. That is a design decision, and it is not a bad one: set.seed(123) is easy to write down, and controlled behaviour afterwards is guaranteed.

If you actually dig deeper, there is way more inside the multiple (!!) RNGs used by R. There are different ones, you can switch between them, and even by default you get (as I recall) different ones for uniform and normal draws. Still, the seeding interface covers both.

And at the C level, there is a much larger (more complicated) data structure at play.

Now, my digest package. It does indeed operate on arbitrary R objects returning string objects. As such, it does not help with set.seed() as these characters are not integers. But you could for example set an intermediating layer where you once again 'hash-map' these character strings to integers.

In short, I think you need to rethink your design a little.

Edit: By request, even if I think this is not the way to do it:

 R> c2i <- function(s) sum(as.integer(charToRaw(s)))
 R> c2i(digest(42))
 [1] 2332
 R> set.seed(c2i(digest(42)))

Upvotes: 3

Related Questions