Dekike
Dekike

Reputation: 1284

Create a vector from another selecting some of its values at random but in order and with a minimum distance between selected ones?

I have a vector of numbers and I would like to select some of them at random but in order. How could I do it?

For example:

vector <- runif(10, min=0, max=101)
vector 

  [1] 35.956732 67.608039 20.099881 23.184217  9.157408 34.105185 97.459770 25.805254 74.537667 18.865662

Which code can I use to create a new vector containing, for example, four out of the 10 values with the requirement that those four values are in the same order than the original vector? That is, the vector can not be 9.157408 67.608039 74.537667 97.459770 but 67.608039 9.157408 97.459770 74.537667.

Any help would be great. Thanks in advance.

Second part (updated)

What if I want a certain number of steps among consecutive selected values?

That is, if I have for instance this vector:

[1] 2.1 3.4 1.6 8.9 2.3 5.4 6.4 1.3 10.8 3.7 13.4 2.4 5.4 6.8

How can I select 3 out of that 14 values with the additional condition that there has to be at least 3 non-selected values between two selected ones. For example, a selected vector could be 2.1 5.4 6.8 but it couldn't be 1.6 5.4 10.8.

Upvotes: 3

Views: 491

Answers (4)

ThomasIsCoding
ThomasIsCoding

Reputation: 102251

Try sample like

vector[sort(sample(length(vector),4))]

or

vector[head(which(sample(c(TRUE,FALSE),length(vector),replace = TRUE)),4)]

Update

If you have constraints regarding the minimum spacing between random indices, you can try the code below:

  • non-optimized method
f1 <- function(vec,n, min_spacing = 4) {
  idx <- seq_along(vec)
  repeat {
    k <- sort(sample(idx,n))
    if (all(diff(k)>=min_spacing)) break
  }
  vec[k]
}

  • optimized method
f2 <- function(vec, n, min_spacing = 4) {
  u <- unname(tapply(vec, ceiling(seq_along(vec) / min_spacing), sample, size = 1))
  head(u[seq(1, length(u), by = 2)], n)
}

Upvotes: 1

akrun
akrun

Reputation: 887511

We can sample 4 elements from the vector, then match to get the index and subset the vector

v1 <- sample(vector, 4)
vector[match(v1, vector)]

If we need to sample 1 element every 4, we could use rollapply by specifying the width and by

library(zoo)
rollapply(v2, 4, by = 4, FUN = function(x) sample(x, 1))
#[1] 1.6 1.3 2.4

Or use a loop

out <- c()
flag <- TRUE
i <- 1
while(flag) {
    if((i + 4) > length(v2)) {
    break
      flag <- FALSE
      
    }
    
    i1 <- i:(i + 2)
    
    tmp <- sample(i1, 1)
    out <- c(out, tmp)

    i <- tmp + 3
    

}

out
#[1]  3  7 11

data

v2 <- c(2.1, 3.4, 1.6, 8.9, 2.3, 5.4, 6.4, 1.3, 10.8, 3.7, 13.4, 2.4, 
5.4, 6.8)

Upvotes: 2

jared_mamrot
jared_mamrot

Reputation: 26690

One option is to use the createDataPartition() function from the caret package, e.g.

library(caret)
vector <- runif(10, min=0, max=101)
vector
#>[1] 49.12759 37.39169 99.31837 39.22023 23.15373 62.95305 13.79056 97.71442
#>[9] 52.02225 16.47010

sampling_index <- createDataPartition(y = vector, times = 1,
                                      p = 0.3, list = FALSE)
vector[sampling_index]
#>[1] 49.12759 39.22023 23.15373 97.71442

Upvotes: 2

Brian Davis
Brian Davis

Reputation: 992

Is this what you're looking for? Simply use the sort function to put in order.

vector <- runif(10, min=0, max=101)
n <- 5
i <- sort(sample(seq_along(vector),n))
vector[i]

Upvotes: 2

Related Questions