tbkent
tbkent

Reputation: 63

Sampling column values in a matrix, without replacement

i have some experience with R, but always struggle to write new code. i've found several very helpful posts here while working on my current project, but can't seem to find the next step. here's what i've done so far:

now, i need to sample 5 columns from each row, without replacement of columns. that is, i need to use each column only once and have five values in each row. (i don't have a preference on whether this gives me a matrix with 20 values in the right places and 60 zeros, or if i get 4 vectors of 5 values. i guess i sort of want the matrix?)

if context helps, i'm trying to create groups based on topic rankings in a classroom. rows are topics and columns are voters (students). ultimately i want to create these random assignments in a for loop, and run the program many times to hopefully optimize the choices (by some measurement; obviously there are different ways to optimize) automatically rather than by staring at the original matrix, which is what i've done in the past.

this is my 4x20 matrix:

    J  E  I  S  A  N  H  T  M  B  D  K  O  G  P  L  Q  R  F  C
2   5  4  1  1  5 13  3  4 13 11 14 14 20  9 15  9 11 17  9 15
13 20 19 17 19 19  7  4 19  7  1  5  1 17 15 10  6  7 14  6  3
14 18  2 12 14 11 19 18 15 19  4  8 19  2  2 13  7  9  1 12 10
18  4  7 18  5 12 18  2 20  6  7 16 15  5 18  1 13  2 18 14 16

this is (one version of) what i want:

    J  E  I  S  A  N  H  T  M  B  D  K  O  G  P  L  Q  R  F  C
2   0  4  1  1  0  0  3  4  0  0  0  0  0  0  0  0  0  0  0  0
13  0  0  0  0  0  7  0  0  0  1  5  1  0  0  0  0  0  0  0  3
14  0  0  0  0 11  0  0  0  0  0  0  0  0  2  0  7  0  1 12  0
18  4  0  0  0  0  0  0  0  6  0  0  0  5  0  1  0  2  0  0  0

Upvotes: 4

Views: 5490

Answers (4)

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

You can use apply. The following command will randomly sample five values from each row and return a matrix of the results:

apply(mat, 1, sample, 5)

You might wish to transpose the returned matrix with t to match the original matrix.


If you want to use every column only once, you can use the following command:

mat[cbind(seq(nrow(mat)), sample(ncol(mat), 5 * nrow(mat)))]

It will return a vector including the values.

To match the desired output format (matrix including zeros and randomly chosen values), you can use the following strategy:

# create an index of the values to be kept
idx <- cbind(seq(nrow(mat)), sample(ncol(mat), 5 * nrow(mat)))

# create a new matrix of zeroes
mat2 <- matrix(0, ncol = ncol(mat), nrow = nrow(mat))

# copy the values from the original matrix to the new one
mat2[idx] <- mat[idx]

Upvotes: 8

Neal Fultz
Neal Fultz

Reputation: 9687

Using the Matrix package, we can build this pretty easily from indices:

i <- sample(nrow(X), ncol(X), replace=TRUE)
j <- seq(ncol(X))
sparseMatrix(i,j,x=X[cbind(i,j)])

yields:

> sparseMatrix(i,j,x=X[cbind(i,j)])
4 x 20 sparse Matrix of class "dgCMatrix"

[1,] . .  .  .  . 13 .  . 13 . 14  . . 9  .  . .  . . 15
[2,] . .  .  .  .  . .  .  . .  .  . . .  .  . .  . 6  .
[3,] . .  . 14 11  . . 15  . 4  . 19 2 . 13  . .  . .  .
[4,] 4 7 18  .  .  . 2  .  . .  .  . . .  . 13 2 18 .  .

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Assuming your data.frame is called "x", here is a simple approach that results in a list of single row data.frames.

Here's your data:

x <- structure(list(J = c(5L, 20L, 18L, 4L), E = c(4L, 19L, 2L, 7L
  ), I = c(1L, 17L, 12L, 18L), S = c(1L, 19L, 14L, 5L), A = c(5L, 
  19L, 11L, 12L), N = c(13L, 7L, 19L, 18L), H = c(3L, 4L, 18L, 
  2L), T = c(4L, 19L, 15L, 20L), M = c(13L, 7L, 19L, 6L), B = c(11L, 
  1L, 4L, 7L), D = c(14L, 5L, 8L, 16L), K = c(14L, 1L, 19L, 15L
  ), O = c(20L, 17L, 2L, 5L), G = c(9L, 15L, 2L, 18L), P = c(15L, 
  10L, 13L, 1L), L = c(9L, 6L, 7L, 13L), Q = c(11L, 7L, 9L, 2L), 
      R = c(17L, 14L, 1L, 18L), F = c(9L, 6L, 12L, 14L), C = c(15L, 
      3L, 10L, 16L)), .Names = c("J", "E", "I", "S", "A", "N", 
  "H", "T", "M", "B", "D", "K", "O", "G", "P", "L", "Q", "R", "F", 
  "C"), class = "data.frame", row.names = c("2", "13", "14", "18"
  ))

And the sampling:

set.seed(1)
temp <- matrix(sample(20), nrow = 4)
do.call(rbind, lapply(1:4, function(y) {
  x[y, -temp[y, ]] <- 0
  x[y, ]
}))
#     J E  I S  A  N H  T M B D  K O  G  P  L Q R F  C
# 2   0 0  0 1  0 13 0  0 0 0 0 14 0  0  0  0 0 0 9 15
# 13 20 0  0 0  0  0 0 19 0 1 0  0 0 15  0  0 7 0 0  0
# 14  0 0 12 0 11  0 0  0 0 0 8  0 0  0 13  0 0 1 0  0
# 18  0 7  0 0  0  0 2  0 6 0 0  0 5  0  0 13 0 0 0  0

Upvotes: 1

Rcoster
Rcoster

Reputation: 3210

This should work

data <- matrix(sample(letters,20*4,rep=T),4) # Create a fake data

sample <- sample(1:20) # Scramble the order of the columns

out <- matrix(0,4,5) # 5 letters for 4 lines

for (i in 1:4) {
 out[i,] <- data[i,sample[1:5 + (i-1)*5]] # Sample 5 values of each line
}

Upvotes: 1

Related Questions