nhaus
nhaus

Reputation: 1023

shuffle/permutate vector block-wise

I am trying to shuffle elements of a simple vector, but I want to do this operation block-wise, so the simple sample operation doesnt work.

What I mean by that is, that every 2 consecutive values belong to one "block" and I only want to shuffle the blocks, i.e. not the individual values.

I though about grouping my values together using group_by, but the values within each block have nothing in common other than they occur together.

This is how an example vector might look like. (the real vector has over 5e6 elements).

       <------- BLOCK1 ----->  <----- BLOCK 2 ------>    ........
x <- c(0.15055060, 0.69097695, 0.89731929, 0.84515906, 0.54843043, 0.77026955, 0.05127419, 0.33850021, 0.47623089, 0.36896818)

A "successful" shuffle would look for example like this:

                               <------- BLOCK1 ----->                         <----- BLOCK 2 ------>
x <- c(0.05127419, 0.33850021, 0.15055060, 0.69097695, 0.47623089, 0.36896818, 0.89731929, 0.84515906, 0.54843043, 0.77026955)

Any insights how I might accomplish this are very appreciated!

Upvotes: 1

Views: 155

Answers (2)

Joseph Wood
Joseph Wood

Reputation: 7597

Since matrices are stored column-wise in R, we can simply convert our vector to a matrix with the number of rows equal to the block size. After that, all we need to do is shuffle the columns of the matrix and convert it back to a vector. Here is a simple function to do that:

ShuffleBlocks <- function(v, size = 2L) {
    size <- as.integer(size)
    stopifnot(length(v) %% size == 0L)
    
    mat <- matrix(v, nrow = size)
    as.vector(mat[, sample(ncol(mat))])
}

Calling it on the OP's example:

set.seed(42)
ShuffleBlocks(x)
# [1] 0.15055060 0.69097695 0.47623089 0.36896818 0.05127419 0.33850021 0.54843043 0.77026955 0.89731929 0.84515906
#        x[1]       x[2]        x[9]       x[10]      x[7]       x[8]       x[5]       x[6]       x[3]      x[4]

x
#[1] 0.15055060 0.69097695 0.89731929 0.84515906 0.54843043 0.77026955 0.05127419 0.33850021 0.47623089 0.36896818

This generalizes smoothly for different block sizes. For example:

set.seed(321)
ShuffleBlocks(x, size = 5)
# [1] 0.77026955 0.05127419 0.33850021 0.47623089 0.36896818 0.15055060 0.69097695 0.89731929 0.84515906 0.54843043 

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

Shuffle only the block numbers -

inds <- sample(length(x)/2) * 2
x[c(rbind(inds - 1, inds))]
  • length(x)/2 are the number of blocks in x
  • we sample them and multiply by 2 to get the second value of each block
  • subtract - 1 from it to get the first block value
  • combine them together and use it as in index to subset from x.

A general solution for blocks of any size would be -

n <- 5000 #block size
inds <- sample(length(x)/n) * n
x[c(sapply(inds, `-`, (n-1):0))]

Upvotes: 1

Related Questions