Rumák
Rumák

Reputation: 95

R - Given a vector of probabilities, how to find a threshold such that exactly n elements will be classified positive?

Let's say I have a vector of probabilities

> probs <- c(0.2, 0.3, 0.5, 0.7, 0.8, 0.9)
> probs
[1] 0.2 0.3 0.5 0.7 0.8 0.9

I want to classify each element as positive or negative by comparing it to some threshold value (for sake of argument let's say that element with probability >= threshold will be classified as positive, otherwise it is considered negative). I don't know what value of threshold I want to use, but I know I want exactly 3 elements to be classified as positive.

My own solution would be to go over all probabilities and try to use each one as a threshold value and test if it would result in the desired number of positives.

> sum(probs >= 0.2)
[1] 6
> sum(probs >= 0.3)
[1] 5
> sum(probs >= 0.5)
[1] 4
> sum(probs >= 0.7)
[1] 3

Is there any function in R (libraries included) that would offer that functionality out-of-the-box?

EDIT: This problem has a rather straightforward solution (thus making a dedicated function obsolete), therefore I will accept the top solution, even though it doesn't answer the question

Upvotes: 0

Views: 195

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389355

You can sort the vector in decreasing order and select nth value

n <- 3
sort(probs, decreasing = TRUE)[n]
#[1] 0.7

with order

probs[order(-probs)[n]]
#[1] 0.7

Upvotes: 1

Related Questions