Reputation: 1031
a 10x10 matrix contains "likelihoods" for any cell being selected in a given row during a draw.
id1 id2 id3 id4 id5 id6 id7 id8 id9 id10
id1 NA 0.5 0.7 0.5 0.5 0.4 0.4 0.4 0.4 0.4
id2 0.5 NA 0.5 0.5 0.5 0.4 0.4 0.4 0.4 0.4
id3 0.7 0.5 NA 0.5 0.5 0.4 0.4 0.4 0.4 0.4
id4 0.5 0.5 0.5 NA 0.5 0.4 0.4 0.4 0.4 0.4
id5 0.5 0.5 0.5 0.5 NA 0.4 0.4 0.4 0.4 0.4
id6 0.4 0.4 0.4 0.4 0.4 NA 0.5 0.7 0.5 0.5
id7 0.4 0.4 0.4 0.4 0.4 0.5 NA 0.5 0.5 0.5
id8 0.4 0.4 0.4 0.4 0.4 0.7 0.5 NA 0.5 0.5
id9 0.4 0.4 0.4 0.4 0.4 0.5 0.5 0.5 NA 0.5
id10 0.4 0.4 0.4 0.4 0.4 0.5 0.5 0.5 0.5 NA
Each draw is done by row, and the chance of a cell being chosen is the value of that cell divided by the sum of all cell values in a given row. For example, I need to pick a cell from id2
to id10
in the row id1
. The most likely choice is id3
because its value of 0.7
is the highest in the row.
I need a vector called result
that stores the choice for each row after I choose. My current plan is to:
denom
This is obviously way too much work. What's a better way to sample with weights while ignoring the NA values in the diagonal?
Upvotes: 0
Views: 93
Reputation: 15052
You can use apply
with sample
to randomly choose an element from each row. We create a custom function that wraps sample
to deal with the missing values on the diagonal and use the right weights. One convenient thing is that after removing the missing values with na.omit
, the resulting vector still has names, so we can sample names using the corresponding probabilities as weights with the prob
argument.
mat <- as.matrix(read.table(
text = "id1 id2 id3 id4 id5 id6 id7 id8 id9 id10
id1 NA 0.5 0.7 0.5 0.5 0.4 0.4 0.4 0.4 0.4
id2 0.5 NA 0.5 0.5 0.5 0.4 0.4 0.4 0.4 0.4
id3 0.7 0.5 NA 0.5 0.5 0.4 0.4 0.4 0.4 0.4
id4 0.5 0.5 0.5 NA 0.5 0.4 0.4 0.4 0.4 0.4
id5 0.5 0.5 0.5 0.5 NA 0.4 0.4 0.4 0.4 0.4
id6 0.4 0.4 0.4 0.4 0.4 NA 0.5 0.7 0.5 0.5
id7 0.4 0.4 0.4 0.4 0.4 0.5 NA 0.5 0.5 0.5
id8 0.4 0.4 0.4 0.4 0.4 0.7 0.5 NA 0.5 0.5
id9 0.4 0.4 0.4 0.4 0.4 0.5 0.5 0.5 NA 0.5
id10 0.4 0.4 0.4 0.4 0.4 0.5 0.5 0.5 0.5 NA"
))
foo <- function(row) {
no_na <- na.omit(row)
sample(x = names(no_na), size = 1, prob = no_na)
}
result <- apply(mat, 1, foo)
result
#> id1 id2 id3 id4 id5 id6 id7 id8 id9 id10
#> "id2" "id9" "id4" "id2" "id3" "id8" "id8" "id10" "id3" "id7"
Created on 2019-09-24 by the reprex package (v0.3.0)
Upvotes: 2
Reputation: 1087
I think what you need it's the sample
function https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/sample
Upvotes: 0