J. Doe
J. Doe

Reputation: 1740

r - draw random samples from a uniform distribution with a gap in the interval

I am interested into drawing random samples from a uniform distribution which is spread across 2 intervals: from -2 to -1 and from 1 to 2. Any value which belongs to these intervals should have equal probability of being sampled and the other values (e.g. from -1 to 1) should have 0 probability from being sampled.

The runif function only accepts single numbers as arguments for the upper and lower limit. Thus, it is impossible to make that function sample from two intervals.

Another option would be running two runif functions, one on the interval from -2 to -1, and one on the interval from 1 to 2. However, in this case I explicitly make it so that in each draw there will be an equal number of numbers drawn from each interval.

I want to allow this to vary and, while on the average there will be roughly the same amount of positive and negative numbers, I still need to see some random variation, e.g. draws where more numbers were drawn from the positive interval or vice versa.

Upvotes: 4

Views: 762

Answers (4)

GKi
GKi

Reputation: 39667

Maybe add the gap afterwards like:

n <- 10
x <- runif(n, -2, 0)
i <- x > -1
x[i] <- x[i] + 2

As a Function:

runifGap <- function(n, xMin, xMax, gapMin, gapMax) {
  dGap <- gapMax- gapMin
  x <- runif(n, xMin, xMax - dGap)
  i <- x>gapMin
  x[i] <- x[i] + dGap
  x
}
runifGap(10, -2, 2, -1, 1)

Or in case of more than one gap:

runifGaps <- function(n, r) {
  d <- diff(r)
  dj <- d[c(FALSE, TRUE)]
  x <- runif(n, r[1], r[1] + sum(d[c(TRUE, FALSE)]))
  cd <- r[c(FALSE, TRUE)]
  for(i in seq_along(dj)) {
    j <- which(x > cd[i])
    x[j] <- x[j] + dj[i]
  }
  x
}
runifGaps(10, c(-2, -1, 1, 2))
runifGaps(10, c(-2, -1, 1, 2, 4, 6)) #for ranges -2:-1, 1:2 and 4:6

Or using findInterval instead of a for loop.

runifGapsB <- function(n, r) {
  d <- diff(r)
  x <- runif(n, r[1], r[1] + sum(d[c(TRUE, FALSE)]))
  x + c(0, cumsum(d[c(FALSE, TRUE)]))[1 +
                 findInterval(x, r[1] + cumsum(head(d[c(TRUE, FALSE)], -1)))]
}

Benchmark

n <- 1e7

bench::mark(check = FALSE,
shs = runif_multiinterval(n = n, intervals = tribble(~min, ~max, -2, -1,  1, 2)),
"M.Viking" = ifelse(runif(n)>0.5, runif(n,-2,-1), runif(n,1,2)),
"Stéphane Laurent" = (2L * sample.int(2L, size=n, replace = TRUE) - 3L) * runif(n, 1, 2),
LaurentGKi = sample(c(-1, 1), n, TRUE) * runif(n, 1, 2),
LaurentGKi2 = c(-1L, 1L)[sample.int(2L, n, TRUE)] * runif(n, 1, 2),
GKi = {x <- runif(n, -2, 0); i <- x>-1; `[<-`(x, i, x[i] + 2)}, 
GKi2 = {x <- runif(n, -2, 0); i <- which(x>-1); `[<-`(x, i, x[i] + 2)},
GKi3 = runifGaps(n, c(-2, -1, 1, 2))
)
#  expression            min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#  <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#1 shs                 1.12s    1.12s     0.891     305MB     1.78     1     2
#2 M.Viking         519.98ms 519.98ms     1.92      763MB     5.77     1     3
#3 Stéphane Laurent 274.15ms  276.7ms     3.61      114MB     1.81     2     1
#4 LaurentGKi       270.67ms 270.76ms     3.69      191MB     1.85     2     1
#5 LaurentGKi2      263.47ms 266.96ms     3.75      153MB     1.87     2     1
#6 GKi              214.68ms 218.12ms     4.25      343MB     7.08     3     5
#7 GKi2             191.72ms 200.83ms     5.05      286MB     6.73     3     4
#8 GKi3             168.85ms 170.49ms     5.36      210MB     5.36     3     3

In this example the methods of GKi are the fastest and the method of Stéphane Laurent uses the lowest amount of memory.

Upvotes: 4

shs
shs

Reputation: 3901

I wrote a small function to generalize uniform sampling across multiple intervals. It uses 2-stage sampling. First, the number of units per interval is drawn from the multinomial distribution, weighted by the respective interval width. Second, runif is applied to those draws with the specified interval parameters.

library(dplyr)
library(purrr)

#' Draw uniformly across multiple intervals.
#' @param n number of output elements
#' @param intervals data frame of intervals. Must have 2 columns: min, max
runif_multiinterval <- \(n, intervals) {
  intervals |>
    mutate(n = rmultinom(1, n, prob = max - min)) |> # 1st stage: draw group sizes
    pmap(runif) |> # 2nd stage: draw within intervals
    unlist() |>
    sample() # randomize order
}

Works for your case:

runif_multiinterval(
  n = 1e5,
  intervals = tribble(
    ~min, ~max,
      -2,   -1,
       1,    2
  )) |>
  hist()

Works for multiple, differently sized intervals:

runif_multiinterval(
  n = 1e5,
  intervals = tribble(
    ~min, ~max,
      -2,   -1,
       1,    2,
       4,    6
  )) |>
  hist()

Created on 2023-02-13 with reprex v2.0.2

Upvotes: 3

St&#233;phane Laurent
St&#233;phane Laurent

Reputation: 84529

Since this is symmetric around 0 you can simulate a random sign and assign it to a uniform value in [1,2]:

nsims <- 1e6L
random_sign <- 2L * sample.int(2L, size=nsims, replace = TRUE) - 3L
sims <- random_sign * runif(nsims, 1, 2)

hist(sims)

Upvotes: 4

M.Viking
M.Viking

Reputation: 5398

Edit: As my first answer led to exactly equal amounts of positive and negative values, here is another way that executes runif 3 times:

ifelse(runif(1000)>0.5, runif(1000,-2,-1), runif(1000,1,2))

Appears to be undocumented, but experimentation led me to:

runif(1000, min=c(-2, 1), max=c(-1, 2))

example:

hist(runif(1000000, min=c(-2, 1), max=c(-1, 2)))

enter image description here

Upvotes: 4

Related Questions