user1830307
user1830307

Reputation:

Reorder random columns between blocked rows in a data frame in R

I would like to randomly reorder each column separately between given blocks of rows of a data frame. The row blocks are consecutive as shown:

mylist=list(1:50,51:52,53:102,103:128,129:154,155:180,181:206,207:232,233:258,259:284,285:310,311:336,337:362,363:388,389:414,415:440,441:466,467:492,493:518,519:544,545:570,571:596,597:622,623:648,649:674,675:700)

Say I have a data.frame called dat. It is 700 rows and 50 columns. So basically, for each of these 26 row blocks, I would like each column to be randomly reordered within.

An example with a smaller data.frame could be A =

1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9

where each row bin is as follows:

mylist=list(1:2,3:6,7:9)

could lead to the reordered data frame B=

1 2 1 1 1
2 1 2 2 2
3 4 3 5 3
4 6 4 3 4
5 5 5 6 5
6 3 6 4 6
8 9 8 7 9
9 7 9 8 8
7 8 7 9 7

Thank you.

Upvotes: 2

Views: 372

Answers (4)

Henrik
Henrik

Reputation: 67828

You may try this:

# create a 'blocking variable'
block <- rep(x = seq_along(mylist), times = sapply(mylist, length))

# within each block, loop over columns and 'shuffle' each column using `sample`
set.seed(1)
B <- do.call(rbind.data.frame,
             by(A, block, function(dat){
               sapply(dat, function(x) sample(x))
               })
             )

B
#     V1 V2 V3 V4 V5
# 1.1  1  2  1  2  2
# 1.2  2  1  2  1  1
# 2.1  3  6  4  5  3
# 2.2  6  4  5  3  4
# 2.3  4  5  6  6  5
# 2.4  5  3  3  4  6
# 3.1  8  7  9  8  9
# 3.2  9  8  7  9  8
# 3.3  7  9  8  7  7

Update following @Ananda Mahto's comment:
If B doesn't have to be a data frame and you are happy with a matrix instead, do.call(rbind.data.frame, can be replaced with do.call(rbind,, which will be faster.

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

Here is one approach. It doesn't require a data.frame named "A" to exist first and, like BrodieG's answer, assumes that there are no holes or repeats in "mylist".

This will result in a matrix with the number of columns specified by Ncol.

Ncol <- 50                                # Number of columns
A1 <- seq_along(unlist(mylist, use.names = FALSE))
do.call(rbind,                            # ^^ Generate a sequence 
        lapply(mylist, function(x) {      # Traverse the list
          replicate(Ncol, sample(A1[x]))  # Use replicate with sample
        }))

Here it is applied to your small list:

mylist <- list(1:2,3:6,7:9)
set.seed(1)   # to be able to reproduce this answer
Ncol <- 5
A1 <- seq_along(unlist(mylist, use.names = FALSE))
do.call(rbind,      
        lapply(mylist, function(x) {  
          replicate(Ncol, sample(A1[x]))
        }))
#       [,1] [,2] [,3] [,4] [,5]
#  [1,]    1    2    1    2    2
#  [2,]    2    1    2    1    1
#  [3,]    3    6    4    5    3
#  [4,]    6    4    5    3    4
#  [5,]    4    5    6    6    5
#  [6,]    5    3    3    4    6
#  [7,]    8    7    9    8    9
#  [8,]    9    8    7    9    8
#  [9,]    7    9    8    7    7

Another option to consider is shuffle from the "permute" package. For this, you create a grouping variable, as Henrik did in his answer, and you use that as the "blocks" within which to shuffle the given range of values.

library(permute)
mylist <- list(1:2,3:6,7:9)
block <- how(blocks = rep(seq_along(mylist), sapply(mylist, length)))
shuffle(length(block$blocks), block)
# [1] 2 1 4 5 3 6 7 9 8

You can easily use replicate to get a matrix with multiple columns:

set.seed(1)
replicate(5, shuffle(length(block$blocks), block))
#       [,1] [,2] [,3] [,4] [,5]
#  [1,]    1    1    1    1    2
#  [2,]    2    2    2    2    1
#  [3,]    5    3    6    4    5
#  [4,]    6    5    3    6    4
#  [5,]    3    6    4    5    6
#  [6,]    4    4    5    3    3
#  [7,]    9    8    7    7    9
#  [8,]    8    9    9    8    8
#  [9,]    7    7    8    9    7

Upvotes: 1

Max Li
Max Li

Reputation: 5219

This will do the trick

dat_new<-dat[,unlist(mapply(function(x) sample(x),mylist))]

Explanation:

sample() picks a random permutation from x

mapply(function(x) sample(x), mylist) will apply sample() to the single blocks

with unlist() you merge the resulting list and put it dat[,here]

Edit Data proof.

mylist<-list(1:2,3:5)
dat<-data.frame(a=1:2,b=2:3,c=3:4,d=4:5,e=5:6)

dat contains

a b c d e
1 1 2 3 4 5
2 2 3 4 5 6

Reshuffling should take place among (a,b) and among (c,d,e)

dat_new<-dat[,unlist(mapply(function(x) sample(x),mylist))]

dat_new contains

b a d c e

1 2 1 4 3 5

2 3 2 5 4 6

Upvotes: 0

BrodieG
BrodieG

Reputation: 52697

Assuming that your mylist is fully contiguous without holes or repeats (i.e. unlist(mylist) == 1:length(unlist(mylist)), as is the case with the one you provided, then you can do this relatively easily with any of the "split-apply-combine" methods. Here is a data.table implementation, where we first create a splitting index that just repeats a label for each group by the numbers of items in that group, and then we split / re-order by group.

dt[, split.idx:=unlist(
  lapply(
    mylist,                                                        # for each item in mylist
      function(x) rep(paste0(range(x), collapse="-"), length(x))   # create "min-max" label repeated `length` times
) ) ]
dt[, lapply(.SD, sample), by=split.idx]  # for each group (`.SD`), cycle through each column and `sample`

produces (note, I'm subsetting the result to something easily displayed):

    split.idx  V1  V2 V3 V4 V5 V6  V7 V8  V9
 1:      1-50  14   8  9 40 42 47   4 38   5
 2:      1-50  49  11 39 31 15  4  17 18  25
 3:      1-50  29  25 11  5 37 12  11 29  11
 4:      1-50   6   7 37 23 28 21  22 45  36
 5:      1-50  16  30  4 46 35 17  46 47  47
 6:      1-50   3  21 22 20 14 40   6  2  44
 7:     51-52  51  51 51 52 52 51  52 52  52
 8:     51-52  52  52 52 51 51 52  51 51  51
 9:    53-102  71  61 56 87 77 81  78 69  64
10:    53-102 101  73 59 92 63 59 101 84  96
11:    53-102  74  98 75 62 60 85  73 70  97
12:    53-102  73  94 64 79 95 77  81 74  53
13:    53-102  88  63 71 86 57 82  61 61  93
14:    53-102  91  65 76 96 82 76  77 62  68
15:    53-102  97  55 68 74 83 64  91 77  91
16:    53-102  96  68 96 98 86 83  82 54 102
17:    53-102  64  62 73 64 79 65  60 90  57
18:    53-102  63  99 74 78 56 68  59 81  79
19:    53-102  93  79 78 71 85 57  88 91  65
20:    53-102  84 101 72 65 87 56  65 64  61
21:    53-102  76  81 99 63 96 73  67 67  63
    split.idx  V1  V2 V3 V4 V5 V6  V7 V8  V9

You can see clearly, in particular from the 51-52 group, that that one only has values 51-52. And here is the data I used:

library(data.table)
set.seed(1)
dt <- data.table(replicate(50, 1:700))

Upvotes: 0

Related Questions