Randomise blocks of values

Question

I have a dataset where I have several variables - measurements of length, width etc - and a grouping variable - line. I want to randomise the data in such a way that the between line variances remain, but line covariances among variables are broken.

Using iris as an example, here I can get the blocks of values for each species to stay together by the grouping variable of species and get randomised to new species for each trait individually - exactly as I want - but it is also putting NAs in the data. How can I get this to reduce so that the shape is the same as the original data?

library(data.table)
set.seed(21)

dtIris <- data.table(id = rep(1:9, times = 1), iris[c(1:3, 51:53, 101:103), ])

dtIris 

dcast(
  melt(dtIris, id.vars = c('id', 'Species'))[
    melt(dtIris, id.vars = c('id', 'Species'))[, 
      .('Species' = unique(Species), 'new' = sample(unique(Species))), by = variable], 
    on = c('Species', 'variable')][, -c('Species')], 
  ... ~ variable, value.vars = 'value')

This is putting the data in long format, sampling unique values of species for each trait, merging that back onto the data in long format, then spreading it back to wide format. It is leaving NAs where new != Species.

    id        new Sepal.Length Sepal.Width Petal.Length Petal.Width
 1:  1     setosa           NA         3.5          1.4          NA
 2:  1  virginica          5.1          NA           NA         0.2
 3:  2     setosa           NA         3.0          1.4          NA
...

Randomise blocks of values

Answers (1)

Related Questions