Reputation:
I would like to randomly reorder each column separately between given blocks of rows of a data frame. The row blocks are consecutive as shown:
mylist=list(1:50,51:52,53:102,103:128,129:154,155:180,181:206,207:232,233:258,259:284,285:310,311:336,337:362,363:388,389:414,415:440,441:466,467:492,493:518,519:544,545:570,571:596,597:622,623:648,649:674,675:700)
Say I have a data.frame called dat. It is 700 rows and 50 columns. So basically, for each of these 26 row blocks, I would like each column to be randomly reordered within.
An example with a smaller data.frame could be A =
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
where each row bin is as follows:
mylist=list(1:2,3:6,7:9)
could lead to the reordered data frame B=
1 2 1 1 1
2 1 2 2 2
3 4 3 5 3
4 6 4 3 4
5 5 5 6 5
6 3 6 4 6
8 9 8 7 9
9 7 9 8 8
7 8 7 9 7
Thank you.
Upvotes: 2
Views: 372
Reputation: 67828
You may try this:
# create a 'blocking variable'
block <- rep(x = seq_along(mylist), times = sapply(mylist, length))
# within each block, loop over columns and 'shuffle' each column using `sample`
set.seed(1)
B <- do.call(rbind.data.frame,
by(A, block, function(dat){
sapply(dat, function(x) sample(x))
})
)
B
# V1 V2 V3 V4 V5
# 1.1 1 2 1 2 2
# 1.2 2 1 2 1 1
# 2.1 3 6 4 5 3
# 2.2 6 4 5 3 4
# 2.3 4 5 6 6 5
# 2.4 5 3 3 4 6
# 3.1 8 7 9 8 9
# 3.2 9 8 7 9 8
# 3.3 7 9 8 7 7
Update following @Ananda Mahto's comment:
If B
doesn't have to be a data frame and you are happy with a matrix instead, do.call(rbind.data.frame,
can be replaced with do.call(rbind,
, which will be faster.
Upvotes: 1
Reputation: 193687
Here is one approach. It doesn't require a data.frame
named "A" to exist first and, like BrodieG's answer, assumes that there are no holes or repeats in "mylist".
This will result in a matrix with the number of columns specified by Ncol
.
Ncol <- 50 # Number of columns
A1 <- seq_along(unlist(mylist, use.names = FALSE))
do.call(rbind, # ^^ Generate a sequence
lapply(mylist, function(x) { # Traverse the list
replicate(Ncol, sample(A1[x])) # Use replicate with sample
}))
Here it is applied to your small list:
mylist <- list(1:2,3:6,7:9)
set.seed(1) # to be able to reproduce this answer
Ncol <- 5
A1 <- seq_along(unlist(mylist, use.names = FALSE))
do.call(rbind,
lapply(mylist, function(x) {
replicate(Ncol, sample(A1[x]))
}))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 1 2 2
# [2,] 2 1 2 1 1
# [3,] 3 6 4 5 3
# [4,] 6 4 5 3 4
# [5,] 4 5 6 6 5
# [6,] 5 3 3 4 6
# [7,] 8 7 9 8 9
# [8,] 9 8 7 9 8
# [9,] 7 9 8 7 7
Another option to consider is shuffle
from the "permute" package. For this, you create a grouping variable, as Henrik did in his answer, and you use that as the "blocks" within which to shuffle the given range of values.
library(permute)
mylist <- list(1:2,3:6,7:9)
block <- how(blocks = rep(seq_along(mylist), sapply(mylist, length)))
shuffle(length(block$blocks), block)
# [1] 2 1 4 5 3 6 7 9 8
You can easily use replicate
to get a matrix with multiple columns:
set.seed(1)
replicate(5, shuffle(length(block$blocks), block))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 2
# [2,] 2 2 2 2 1
# [3,] 5 3 6 4 5
# [4,] 6 5 3 6 4
# [5,] 3 6 4 5 6
# [6,] 4 4 5 3 3
# [7,] 9 8 7 7 9
# [8,] 8 9 9 8 8
# [9,] 7 7 8 9 7
Upvotes: 1
Reputation: 5219
This will do the trick
dat_new<-dat[,unlist(mapply(function(x) sample(x),mylist))]
Explanation:
sample() picks a random permutation from x
mapply(function(x) sample(x), mylist) will apply sample() to the single blocks
with unlist() you merge the resulting list and put it dat[,here]
Edit Data proof.
mylist<-list(1:2,3:5)
dat<-data.frame(a=1:2,b=2:3,c=3:4,d=4:5,e=5:6)
dat contains
a b c d e
1 1 2 3 4 5
2 2 3 4 5 6
Reshuffling should take place among (a,b) and among (c,d,e)
dat_new<-dat[,unlist(mapply(function(x) sample(x),mylist))]
dat_new contains
b a d c e
1 2 1 4 3 5
2 3 2 5 4 6
Upvotes: 0
Reputation: 52697
Assuming that your mylist
is fully contiguous without holes or repeats (i.e. unlist(mylist) == 1:length(unlist(mylist))
, as is the case with the one you provided, then you can do this relatively easily with any of the "split-apply-combine" methods. Here is a data.table
implementation, where we first create a splitting index that just repeats a label for each group by the numbers of items in that group, and then we split / re-order by group.
dt[, split.idx:=unlist(
lapply(
mylist, # for each item in mylist
function(x) rep(paste0(range(x), collapse="-"), length(x)) # create "min-max" label repeated `length` times
) ) ]
dt[, lapply(.SD, sample), by=split.idx] # for each group (`.SD`), cycle through each column and `sample`
produces (note, I'm subsetting the result to something easily displayed):
split.idx V1 V2 V3 V4 V5 V6 V7 V8 V9
1: 1-50 14 8 9 40 42 47 4 38 5
2: 1-50 49 11 39 31 15 4 17 18 25
3: 1-50 29 25 11 5 37 12 11 29 11
4: 1-50 6 7 37 23 28 21 22 45 36
5: 1-50 16 30 4 46 35 17 46 47 47
6: 1-50 3 21 22 20 14 40 6 2 44
7: 51-52 51 51 51 52 52 51 52 52 52
8: 51-52 52 52 52 51 51 52 51 51 51
9: 53-102 71 61 56 87 77 81 78 69 64
10: 53-102 101 73 59 92 63 59 101 84 96
11: 53-102 74 98 75 62 60 85 73 70 97
12: 53-102 73 94 64 79 95 77 81 74 53
13: 53-102 88 63 71 86 57 82 61 61 93
14: 53-102 91 65 76 96 82 76 77 62 68
15: 53-102 97 55 68 74 83 64 91 77 91
16: 53-102 96 68 96 98 86 83 82 54 102
17: 53-102 64 62 73 64 79 65 60 90 57
18: 53-102 63 99 74 78 56 68 59 81 79
19: 53-102 93 79 78 71 85 57 88 91 65
20: 53-102 84 101 72 65 87 56 65 64 61
21: 53-102 76 81 99 63 96 73 67 67 63
split.idx V1 V2 V3 V4 V5 V6 V7 V8 V9
You can see clearly, in particular from the 51-52 group, that that one only has values 51-52. And here is the data I used:
library(data.table)
set.seed(1)
dt <- data.table(replicate(50, 1:700))
Upvotes: 0