dugar
dugar

Reputation: 324

Random sampling using running length encoding (rle)

Is it possible to perform a sampling without replacement? Something like this (it does not work as desired):

x <- rle(c(1,1,1,1,1,2,2))

result <- sample(x, size=2, replace=F)

print(inverse.rle(result))
# prints [1] 1 1 1 1 1 1 2 2
# a desired result [1] 1 1

In other words, I'd like to have the following working but with running length encoding:

set.seed(2)  
x <- c(1,1,1,1,1,2,2)

result <- sample(x, size=2, replace=F)

print(result)
# prints [1] 1 2

Upvotes: 1

Views: 111

Answers (2)

Rubbaa
Rubbaa

Reputation: 31

The sampling works out of the box if instead of rle() you use the S4Vectors function Rle().

x <- Rle(c(1,1,1,1,1,2,2))
# numeric-Rle of length 7 with 2 runs
# Lengths: 5 2
# Values : 1 2
xs <- sample(x, 4, replace=F)
# numeric-Rle of length 4 with 2 runs
# Lengths: 2 2
# Values : 2 1
xs2 <- Rle(sort(xs))
# numeric-Rle of length 4 with 2 runs
# Lengths: 2 2
# Values : 1 2
as.vector(xs2)
# [1] 1 1 2 2

Upvotes: 0

pseudospin
pseudospin

Reputation: 2767

Here's a function to do it. You probably need some big numbers to make this worthwhile over just expanding out the rle explicitly.

x <- rle(c(1,1,1,1,1,2,2))

sample_rle <- function(x, ...) {
  x$values[1+findInterval(
    sample(sum(x$lengths), ...),
    cumsum(x$lengths), 
    left.open=TRUE)]
}

sample_rle(x, size = 2, replace = FALSE)
#> [1] 2 1
sample_rle(x, size = 7, replace = FALSE)
#> [1] 2 1 2 1 1 1 1

Upvotes: 1

Related Questions