Steve
Steve

Reputation: 55

How to select randomly rows and columns in each row in a list

I have a list in which there is a 5x5 matrix data set. I want to randomly select 2 rows and within each row I want to select 3 elements not necessarily from same columns.

So, I generated three data sets and made a list. I was able to select 2 rows randomly but have a difficulty in selecting 3 elements randomly not selecting columns.

Here is my code.

    ### Generate three data sets
    dat1 <- (matrix(rnorm(25), ncol=5))
    dat2 <- (matrix(rnorm(25), ncol=5))
    dat3 <- (matrix(rnorm(25), ncol=5))

    all.dat <- list(dat1=dat1, dat2=dat2, dat3=dat3)
    all.dat

    #$`dat1`
    #           [,1]      [,2]       [,3]        [,4]       [,5]
    #[1,]  1.4394742 0.7064418 -1.3472468  0.52847179 -0.7642337
    #[2,]  0.2490570 0.7510308 -0.7028238 -0.09730666 -0.6340773
    #[3,]  0.8981850 0.7592610  0.9139721 -0.45700647 -0.2727481
    #[4,] -1.0467119 0.2147032 -3.2104254 -0.17797056  0.8897180
    #[5,] -0.5437118 0.5803862 -0.1814992  1.93316139 -1.3708932

    #$dat2
    #          [,1]       [,2]          [,3]         [,4]       [,5]
    #[1,] 1.0442187 -1.4156893  0.5606035101 -1.350030718  0.1538721
    #[2,] 0.2080905 -1.7748005  0.8620324724 -0.169071336 -1.7537700
    #[3,] 0.9153835 -0.9884572 -1.7279901136 -1.334516414  0.5773021
    #[4,] 0.1359423 -1.5107088 -1.4289650078 -0.002001498 -0.4712699
    #[5,] 0.1695023 -0.7315209 -0.0003996577 -1.043326258  1.2939485

    #$dat3
    #           [,1]        [,2]         [,3]       [,4]       [,5]
    #[1,] -1.4994878 -0.59179084  0.998017255  1.4021344  0.5929842
    #[2,]  0.3424003  1.33568858  2.214968765 -0.2434351  1.3588000
    #[3,] -1.0117892  0.91065720 -0.761932994 -0.8117838 -0.4094731
    #[4,] -0.1694781 -0.02937177 -0.826337270  0.2178774 -0.6427046
    #[5,]  0.3413101 -0.56911900  0.001363063  0.5579126 -0.9373204

    ### Select rows and columns.
    all.dat.sel.1 <- 
    lapply(all.dat, function(x) {
    x[sample(nrow(x), size = 2), sample(ncol(x), size = 3)]
    })

    all.dat.sel.1

    #$`dat1`
    #           [,1]       [,2]       [,3]
    #[1,] -0.4570065  0.8981850 -0.2727481
    #[2,]  1.9331614 -0.5437118 -1.3708932

    #$dat2
    #              [,1]         [,2]       [,3]
    #[1,] -0.0003996577 -1.043326258  1.2939485
    #[2,] -1.4289650078 -0.002001498 -0.4712699

    #$dat3
    #           [,1]      [,2]       [,3]
    #[1,] -1.4994878 1.4021344  0.9980173
    #[2,] -0.1694781 0.2178774 -0.8263373

Then, I was able to select rows randomly but elements in each row were from the same columns. For example, values, -1.4994878 in row 1 and -0.1694781 in row 2 were from column 1 in dat3.

What I would like to have is something like this:

    #$dat3
    #           [,1]        [,2]         [,3]
    #[1,] -1.4994878 0.998017255    0.5929842
    #[4,]  0.2178774 -0.02937177 -0.826337270

There is an example of this (https://stackoverflow.com/questions/53095050/sample-random-column-for-each-row-in-data-frame). However, it applied to data frame not list data.

Upvotes: 3

Views: 433

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76402

Take advantage of the fact that a matrix is a folded vector, meaning, a vector with a dim attribute and sample 2*3 vector elements directly.

lapply(all.dat, function(x){
    matrix(sample(x, 2*3), nrow = 2)
})

#$dat1
#           [,1]       [,2]        [,3]
#[1,]  0.5060559 -0.5644520 -0.83717168
#[2,] -0.6937202 -0.4771927  0.06445882
#
#$dat2
#          [,1]      [,2]      [,3]
#[1,] -0.709440 -1.340993 0.5747557
#[2,] -1.068643  1.449496 1.1022975
#
#$dat3
#          [,1]      [,2]         [,3]
#[1,] 0.6482866 0.5630558 -0.007604756
#[2,] 0.6565885 1.3295648 -0.669633580

Note: I have started the script with the call set.seed(1234).

Edit.

After reading the comment by user @Ronak Shah, and the question again, the code below might be what the OP is looking for. It's similar but not the same as Ronak's solution. Once again, the RNG seed was set to 1234 before the data creation code.

lapply(all.dat, function(x){
    t(apply(x[sample(nrow(x), 2), ], 1, sample, size = 3))
})
#$dat1
#           [,1]      [,2]       [,3]
#[1,] -0.4771927 -1.207066  0.5060559
#[2,] -0.4405479  1.084441 -0.9111954
#
#$dat2
#           [,1]       [,2]      [,3]
#[1,]  1.1022975 -0.9685143  1.449496
#[2,] -0.2942939 -0.5012581 -0.280623
#
#$dat3
#           [,1]         [,2]     [,3]
#[1,] -0.3665239 -0.773353424 1.367827
#[2,]  0.3364728 -0.007604756 2.070271

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388862

I think what you are trying to do is

row_const <- 2
col_const <- 3

lapply(all.dat, function(x) {
    rand_rows <- sample(nrow(x), size = row_const)
    t(sapply(rand_rows, function(y) sample(x[y, ], col_const)))
})

#$dat1
#           [,1]       [,2]       [,3]
#[1,] 0.07050839 -0.6868529  0.7013559
#[2,] 0.40077145 -1.0260044 -1.9666172

#$dat2
#           [,1]      [,2]      [,3]
#[1,] -0.3059627 -1.138137 2.1689560
#[2,] -0.2950715  0.837787 0.5539177

#$dat3
#          [,1]       [,2]       [,3]
#[1,] 0.3796395 -0.4910312  0.2533185
#[2,] 0.9222675  0.1238542 -1.0185754

It first selects two random rows from the each matrix and then selects 3 random elements from each row.

data

set.seed(123)
dat1 <- (matrix(rnorm(25), ncol=5))
dat2 <- (matrix(rnorm(25), ncol=5))
dat3 <- (matrix(rnorm(25), ncol=5))
all.dat <- list(dat1=dat1, dat2=dat2, dat3=dat3)

Upvotes: 1

Related Questions