J.Q
J.Q

Reputation: 1031

sampling cells from matrix rows based on cell values

a 10x10 matrix contains "likelihoods" for any cell being selected in a given row during a draw.

        id1 id2 id3 id4 id5 id6 id7 id8 id9 id10
id1     NA  0.5 0.7 0.5 0.5 0.4 0.4 0.4 0.4 0.4
id2     0.5 NA  0.5 0.5 0.5 0.4 0.4 0.4 0.4 0.4
id3     0.7 0.5 NA  0.5 0.5 0.4 0.4 0.4 0.4 0.4
id4     0.5 0.5 0.5 NA  0.5 0.4 0.4 0.4 0.4 0.4
id5     0.5 0.5 0.5 0.5 NA  0.4 0.4 0.4 0.4 0.4
id6     0.4 0.4 0.4 0.4 0.4 NA  0.5 0.7 0.5 0.5
id7     0.4 0.4 0.4 0.4 0.4 0.5 NA  0.5 0.5 0.5
id8     0.4 0.4 0.4 0.4 0.4 0.7 0.5 NA  0.5 0.5
id9     0.4 0.4 0.4 0.4 0.4 0.5 0.5 0.5 NA  0.5
id10    0.4 0.4 0.4 0.4 0.4 0.5 0.5 0.5 0.5 NA

Each draw is done by row, and the chance of a cell being chosen is the value of that cell divided by the sum of all cell values in a given row. For example, I need to pick a cell from id2 to id10 in the row id1. The most likely choice is id3 because its value of 0.7 is the highest in the row.

I need a vector called result that stores the choice for each row after I choose. My current plan is to:

  1. sum across rows and store the results as a vector denom
  2. generate a random uniform variable between 0 and this sum for each row
  3. if the value is between 0.0 and 0.5, the chosen person in row 1 is id2; if 0.51-1.20, the chosen person is id3...etc.

This is obviously way too much work. What's a better way to sample with weights while ignoring the NA values in the diagonal?

Upvotes: 0

Views: 93

Answers (2)

Calum You
Calum You

Reputation: 15052

You can use apply with sample to randomly choose an element from each row. We create a custom function that wraps sample to deal with the missing values on the diagonal and use the right weights. One convenient thing is that after removing the missing values with na.omit, the resulting vector still has names, so we can sample names using the corresponding probabilities as weights with the prob argument.

mat <- as.matrix(read.table(
text = "id1 id2 id3 id4 id5 id6 id7 id8 id9 id10
id1  NA  0.5 0.7 0.5 0.5 0.4 0.4 0.4 0.4 0.4
id2  0.5 NA  0.5 0.5 0.5 0.4 0.4 0.4 0.4 0.4
id3  0.7 0.5 NA  0.5 0.5 0.4 0.4 0.4 0.4 0.4
id4  0.5 0.5 0.5 NA  0.5 0.4 0.4 0.4 0.4 0.4
id5  0.5 0.5 0.5 0.5 NA  0.4 0.4 0.4 0.4 0.4
id6  0.4 0.4 0.4 0.4 0.4 NA  0.5 0.7 0.5 0.5
id7  0.4 0.4 0.4 0.4 0.4 0.5 NA  0.5 0.5 0.5
id8  0.4 0.4 0.4 0.4 0.4 0.7 0.5 NA  0.5 0.5
id9  0.4 0.4 0.4 0.4 0.4 0.5 0.5 0.5 NA  0.5
id10 0.4 0.4 0.4 0.4 0.4 0.5 0.5 0.5 0.5 NA"
))

foo <- function(row) {
  no_na <- na.omit(row)
  sample(x = names(no_na), size = 1, prob = no_na)
}

result <- apply(mat, 1, foo)
result
#>    id1    id2    id3    id4    id5    id6    id7    id8    id9   id10 
#>  "id2"  "id9"  "id4"  "id2"  "id3"  "id8"  "id8" "id10"  "id3"  "id7"

Created on 2019-09-24 by the reprex package (v0.3.0)

Upvotes: 2

Marco De Virgilis
Marco De Virgilis

Reputation: 1087

I think what you need it's the sample function https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/sample

Upvotes: 0

Related Questions