Reputation: 1023
I am trying to shuffle elements of a simple vector, but I want to do this operation block-wise, so the simple sample
operation doesnt work.
What I mean by that is, that every 2 consecutive values belong to one "block" and I only want to shuffle the blocks, i.e. not the individual values.
I though about grouping my values together using group_by
, but the values within each block have nothing in common other than they occur together.
This is how an example vector might look like. (the real vector has over 5e6 elements).
<------- BLOCK1 -----> <----- BLOCK 2 ------> ........
x <- c(0.15055060, 0.69097695, 0.89731929, 0.84515906, 0.54843043, 0.77026955, 0.05127419, 0.33850021, 0.47623089, 0.36896818)
A "successful" shuffle would look for example like this:
<------- BLOCK1 -----> <----- BLOCK 2 ------>
x <- c(0.05127419, 0.33850021, 0.15055060, 0.69097695, 0.47623089, 0.36896818, 0.89731929, 0.84515906, 0.54843043, 0.77026955)
Any insights how I might accomplish this are very appreciated!
Upvotes: 1
Views: 155
Reputation: 7597
Since matrices are stored column-wise in R
, we can simply convert our vector to a matrix with the number of rows equal to the block size. After that, all we need to do is shuffle the columns of the matrix and convert it back to a vector. Here is a simple function to do that:
ShuffleBlocks <- function(v, size = 2L) {
size <- as.integer(size)
stopifnot(length(v) %% size == 0L)
mat <- matrix(v, nrow = size)
as.vector(mat[, sample(ncol(mat))])
}
Calling it on the OP's example:
set.seed(42)
ShuffleBlocks(x)
# [1] 0.15055060 0.69097695 0.47623089 0.36896818 0.05127419 0.33850021 0.54843043 0.77026955 0.89731929 0.84515906
# x[1] x[2] x[9] x[10] x[7] x[8] x[5] x[6] x[3] x[4]
x
#[1] 0.15055060 0.69097695 0.89731929 0.84515906 0.54843043 0.77026955 0.05127419 0.33850021 0.47623089 0.36896818
This generalizes smoothly for different block sizes. For example:
set.seed(321)
ShuffleBlocks(x, size = 5)
# [1] 0.77026955 0.05127419 0.33850021 0.47623089 0.36896818 0.15055060 0.69097695 0.89731929 0.84515906 0.54843043
Upvotes: 2
Reputation: 388982
Shuffle only the block numbers -
inds <- sample(length(x)/2) * 2
x[c(rbind(inds - 1, inds))]
length(x)/2
are the number of blocks in x
sample
them and multiply by 2 to get the second value of each blockx
.A general solution for blocks of any size would be -
n <- 5000 #block size
inds <- sample(length(x)/n) * n
x[c(sapply(inds, `-`, (n-1):0))]
Upvotes: 1