Antoni Parellada
Antoni Parellada

Reputation: 4811

Loop While condition is TRUE

I am trying to generate n random numbers whose sum is less than 1.

So I can't just run runif(3). But I can condition each iteration on the sum of all values generated up to that point.

The idea is to start an empty vector, v, and set up a loop such that for each iteration, i, a runif() is generated, but before it is accepted as an element of v, i.e. v[i] <- runif(), the test sum(v) < 1 is carried out, and while FALSE the last entry v[i] is finally accepted, BUT if TRUE, that is the sum is greater than 1, v[i] is tossed out of the vector, and the iteration i is repeated.

I am far from implementing this idea, but I would like to resolve it along the lines of something similar to what follows. It's not so much a practical problem, but more of an exercise to understand the syntax of loops in general:

n <- 4
v <- 0

for (i in 1:n){
    rdom <- runif(1)
    if((sum(v) + rdom) < 1) v[i] <- rdom
    }
    # keep trying before moving on to iteration i + 1???? i <- stays i?????
} 

I have looked into while (actually I incorporated the while function in the title); however, I need the vector to have n elements, so I get stuck if I try something that basically tells R to add random uniform realizations as elements of the vector v while sum(v) < 1, because I can end up with less than n elements in v.

Upvotes: 1

Views: 1430

Answers (4)

RHertel
RHertel

Reputation: 23818

Here's how I would do it, without any loop, if or while:

set.seed(123)
x <- runif(1) # start with the sum that you want to obtain
n <- 4 # number of generated random numbers, can be chosen arbitrarily 
y <- sort(runif(n-1,0,x)) # choose n-1 random points to cut the range [0:x]
z <- c(y[1],diff(y),x-y[n-1]) # result: determine the length of the segments
#> z
#[1] 0.11761257 0.10908627 0.02723712 0.03364156
#> sum(z)
#[1]  0.2875775
#> all.equal(sum(z),x)
#[1] TRUE

The advantage here is that you can determine exactly which sum you want to obtain and how many numbers n you want to generate for this. If you set, e.g., x <- 1 in the second line, the n random numbers stored in the vector z will add up to one.

Upvotes: 1

josliber
josliber

Reputation: 44340

If you really want to keep exactly the same procedure that you have posted (aka iteratively sample the n values one at a time from the standard uniform distribution, rejecting any samples that cause your sum to exceed 1), then the following code is mathematically equivalent, shorter, and more efficient:

samp <- function(n) {
  v <- rep(0, n)
  for (i in 1:n) {
    v[i] <- runif(1, 0, 1-sum(v))
  }
  v
}

Basically, this code uses the mathematical fact that if the sum of the vector is currently sum(v), then sampling from the standard uniform distribution until you get a value no greater than 1-sum(v) is exactly equivalent to sampling in the uniform distribution from 0 to 1-sum(v). The advantage of using the latter approach is that it's much more efficient -- we don't need to keep rejecting samples and trying again, and can instead just sample once for each element.

To get a sense of the runtime differences, consider sampling 100 observations with n=10, comparing to a working implementation of the code from your post (copied from my other answer to this question):

OP <- function(n) {
  v <- rep(0, n)
  for (i in 1:n){
    rdom <- runif(1)
    while (sum(v) + rdom > 1) rdom <- runif(1)
    v[i] <- rdom
  }
  v
}
set.seed(144)
system.time(samples.OP <- replicate(100, OP(10)))
#    user  system elapsed 
# 261.937   1.641 265.805 
system.time(samples.josliber <- replicate(100, samp(10)))
#    user  system elapsed 
#   0.004   0.001   0.004

In this case, the new approach is approaching 100,000 times faster.

Upvotes: 2

josliber
josliber

Reputation: 44340

It sounds like you're trying to uniformly sample from a space of n variables where the following constraints hold:

x_1 + x_2 + ... + x_n <= 1
x_1 >= 0
x_2 >= 0
...
x_n >= 0

The "hit and run" algorithm is the mathematical machinery that enables you to do exactly this. In 2-dimensional space, the algorithm will sample uniformly from the following triangle, with each location in the shaded area being equally likely to be selected:

enter image description here

The algorithm is provided in R through the hitandrun package, which requires you to specify the linear inequalities that define the space through a constraint matrix, direction vector, and right-hand side vector:

library(hitandrun)
n <- 3
constr <- list(constr = rbind(rep(1, n), -diag(n)),
               dir = c(rep("<=", n+1)),
               rhs = c(1, rep(0, n)))
set.seed(144)
samples <- hitandrun(constr, n.samples=1000)
head(samples, 10)
#             [,1]       [,2]       [,3]
#  [1,] 0.28914690 0.01620488 0.42663224
#  [2,] 0.65489979 0.28455231 0.00199671
#  [3,] 0.23215115 0.00661661 0.63597912
#  [4,] 0.29644234 0.06398131 0.60707269
#  [5,] 0.58335047 0.13891392 0.06151205
#  [6,] 0.09442808 0.30287832 0.55118290
#  [7,] 0.51462261 0.44094683 0.02641638
#  [8,] 0.38847794 0.15501252 0.31572793
#  [9,] 0.52155055 0.09921046 0.13304728
# [10,] 0.70503030 0.03770875 0.14299089

Breaking down this code a bit, we generated the following constraint matrix:

constr
# $constr
#      [,1] [,2] [,3]
# [1,]    1    1    1
# [2,]   -1    0    0
# [3,]    0   -1    0
# [4,]    0    0   -1
# 
# $dir
# [1] "<=" "<=" "<=" "<="
# 
# $rhs
# [1] 1 0 0 0

Reading across the first line of constr$constr we have 1, 1, 1 which indicates "1*x1 + 1*x2 + 1*x3". The first element of constr$dir is <=, and the first element of constr$rhs is 1; putting it together we have x1 + x2 + x3 <= 1. From the second row of constr$constr we read -1, 0, 0 which indicates "-1*x1 + 0*x2 + 0*x3". The second element of constr$dir is <= and the second element of constr$rhs is 0; putting it together we have -x1 <= 0 which is the same as saying x1 >= 0. The similar non-negativity constraints follow in the remaining rows.

Note that the hit and run algorithm has the nice property of having the exact same distribution for each of the variables:

hist(samples[,1])

enter image description here

hist(samples[,2])

enter image description here

hist(samples[,3])

enter image description here

Meanwhile, the distribution of the samples from your procedure will be highly uneven, and as n increases this problem will get worse and worse.

OP <- function(n) {
  v <- rep(0, n)
  for (i in 1:n){
    rdom <- runif(1)
    while (sum(v) + rdom > 1) rdom <- runif(1)
    v[i] <- rdom
  }
  v
}
samples.OP <- t(replicate(1000, OP(3)))

hist(samples.OP[,1])

enter image description here

hist(samples.OP[,2])

enter image description here

hist(samples.OP[,3])

enter image description here

An added advantage is that the hit-and-run algorithm appears faster -- I generated these 1000 replicates in 0.006 seconds on my computer with hit-and-run and it took 0.3 seconds using the modified code from the OP.

Upvotes: 1

Molx
Molx

Reputation: 6931

Here's a possible solution. It doesn't use while but the more generic repeat. I edited it to use a while and save a couple of lines.

set.seed(0)
n <- 4
v <- numeric(n)
i <- 0
while (i < n) {
  ith <- runif(1)
  if (sum(c(v, ith)) < 1) {
    i <- i+1
    v[i] <- ith
  }
}
v
# [1] 0.89669720 0.06178627 0.01339033 0.02333120

Using a repeat block, you must check for the condition anyways, but, removing the growing problem, it would look very similar:

set.seed(0)
n <- 4
v <- numeric(n)
i <- 0
repeat {
  ith <- runif(1)
  if (sum(c(v, ith)) < 1) {
    i <- i+1
    v[i] <- ith
  }
  if (i == 4) break
} 

Upvotes: 2

Related Questions