KaptajnKasper
KaptajnKasper

Reputation: 170

curious behavior of set.seed inside function

Something i came across today that i don't quite understand. The setup is that i want to generate some uniformly distributed points in the plane, afterwards i want to assign each point an arrival rate. I want to be able to reproduce the same points but assign different arrival rates. I figured i could use the set.seed function for this.

library(dplyr)
library(ggplot2)

seed = NULL
no_of_points = 50 
interval = c("min" = -10, "max" = 10)
arv = c("min" = 1/80, "max" = 1)

plot_data <- function() {
  id <- 1:no_of_points
  # setting the seed here to be able to reproduce if desired
  set.seed(seed)
  x <- runif(no_of_points, min = interval["min"], max = interval["max"])
  y <- runif(no_of_points, min = interval["min"], max = interval["max"])
  # resetting the seed to give "random" arrival rates regardless of the seed
  set.seed(NULL)
  arrival_rate <- runif(no_of_points, min = arv["min"], max = arv["max"])
  
  data <- tibble(
    "Demand point id" = as.character(id),
    "x" = x, 
    "y" = y, 
    "Arrival rate" = arrival_rate
  )
}

ggplot(plot_data()) +
  geom_point(aes(x, y, size = `Arrival rate`))

This works fine when i set a seed and i get a plot like this, which is what i would expect

With a integer value for the seed

However when i have seed = NULL as in the example code i get a plot like this, where it seems that arrival rates are correlated with the x-axis.

With seed = NULL

How can this be explained? Additionally i tried to run the same code but not inside a function, but then i get expected behavior. So i suspect it has something to do with the seed being set inside a function.

Upvotes: 1

Views: 169

Answers (1)

Dan Adams
Dan Adams

Reputation: 5204

I don't think set.seed(NULL) is doing what you expect. In this case I think NULL is initializing the exact same random seed both times you call it. Therefore, the first random number generation after calling set.seed(NULL) (x) is correlated with the first random number generation after you call set.seed(NULL) again (Arrival rate) (but not the second generation of the first instance - y). In this simple example, you can see that the nth random generation after setting a particular seed is correlated with the nth random generation after setting that same seed again, and that using NULL and NULL is basically the same as using 1 and 1.

f <- function(s1 = NULL, s2 = NULL) {
  
  set.seed(s1)
  a <- runif(50)
  b <- runif(50)
  c <- runif(50)
  
  set.seed(s2)
  d <- runif(50)
  e <- runif(50)
  f <- runif(50)
  
  x <- data.frame(a, b, c, d, e, f)
  
  plot(x)
  
}

f(NULL, NULL)

f(1, 1)

f(1, 2)

Created on 2022-01-04 by the reprex package (v2.0.1)

Upvotes: 3

Related Questions