Reputation: 170
Something i came across today that i don't quite understand. The setup is that i want to generate some uniformly distributed points in the plane, afterwards i want to assign each point an arrival rate. I want to be able to reproduce the same points but assign different arrival rates. I figured i could use the set.seed
function for this.
library(dplyr)
library(ggplot2)
seed = NULL
no_of_points = 50
interval = c("min" = -10, "max" = 10)
arv = c("min" = 1/80, "max" = 1)
plot_data <- function() {
id <- 1:no_of_points
# setting the seed here to be able to reproduce if desired
set.seed(seed)
x <- runif(no_of_points, min = interval["min"], max = interval["max"])
y <- runif(no_of_points, min = interval["min"], max = interval["max"])
# resetting the seed to give "random" arrival rates regardless of the seed
set.seed(NULL)
arrival_rate <- runif(no_of_points, min = arv["min"], max = arv["max"])
data <- tibble(
"Demand point id" = as.character(id),
"x" = x,
"y" = y,
"Arrival rate" = arrival_rate
)
}
ggplot(plot_data()) +
geom_point(aes(x, y, size = `Arrival rate`))
This works fine when i set a seed and i get a plot like this, which is what i would expect
However when i have seed = NULL
as in the example code i get a plot like this, where it seems that arrival rates are correlated with the x-axis.
How can this be explained? Additionally i tried to run the same code but not inside a function, but then i get expected behavior. So i suspect it has something to do with the seed being set inside a function.
Upvotes: 1
Views: 169
Reputation: 5204
I don't think set.seed(NULL)
is doing what you expect. In this case I think NULL
is initializing the exact same random seed both times you call it. Therefore, the first random number generation after calling set.seed(NULL)
(x
) is correlated with the first random number generation after you call set.seed(NULL)
again (Arrival rate
) (but not the second generation of the first instance - y
). In this simple example, you can see that the n
th random generation after setting a particular seed is correlated with the n
th random generation after setting that same seed again, and that using NULL
and NULL
is basically the same as using 1
and 1
.
f <- function(s1 = NULL, s2 = NULL) {
set.seed(s1)
a <- runif(50)
b <- runif(50)
c <- runif(50)
set.seed(s2)
d <- runif(50)
e <- runif(50)
f <- runif(50)
x <- data.frame(a, b, c, d, e, f)
plot(x)
}
f(NULL, NULL)
f(1, 1)
f(1, 2)
Created on 2022-01-04 by the reprex package (v2.0.1)
Upvotes: 3