I have two questions, that I'd like to use R to solve. I have a vector of values which distribution is unknown. How do I calculate the probability of one of the values in the vector in R How do I calculate the probability of one value happening by simulating 1000 times my test data is as follows: values_all <- c(rep(1, 3), rep(2, 5), rep(3, 2), 4, rep(5, 4), rep(6, 2), rep(7, 3)) prob_to_find <- 5 Grateful for any assistance.

rprobability

cephalopod

Reputation: 1906

Computing probabilities in R

I have two questions, that I'd like to use R to solve.

I have a vector of values which distribution is unknown.

How do I calculate the probability of one of the values in the vector in R
How do I calculate the probability of one value happening by simulating 1000 times

my test data is as follows:

values_all <- c(rep(1, 3), rep(2, 5), rep(3, 2), 4, rep(5, 4), rep(6, 2), rep(7, 3))
prob_to_find <- 5

Grateful for any assistance.

Upvotes: 0

Answers (2)

CompSocialSciR

Reputation: 593

To calculate the probability of a value from the unknown distribution you can basically compute the probabilities of the values:

prop.table(table(values_all))
values_all

which outputs:

1    2    3    4    5    6    7 
0.15 0.25 0.10 0.05 0.20 0.10 0.15

Or, you need to assume a distribution after inspecting your vector, e.g. a uniform(1,7) would be:

> punif(3, min = 1, max = 7)
[1] 0.3333333

On this decision process refer to this StackExchange answer. Also, note that with continuous distributions you should compute the difference between two double (numeric) values since the probability of a specific value would be zero by definition.

To avoid discretionary decisions, running simulations is often a safer choice. You can just sample with replacement:

b <- vector("numeric", 1000)
set.seed(1234)
for (i in 1:1000){
    b[i] <- sample(values_all, size=1, replace = T)
}
prop.table(table(b))

Which returns:

b
    1     2     3     4     5     6     7 
0.144 0.251 0.087 0.053 0.207 0.099 0.159

I.e.: a probability of value 3=8.7%.

Upvotes: 3

P1storius

Reputation: 947

For question 1 you can use this:

values_all <- c(rep(1, 3), rep(2, 5), rep(3, 2), 4, rep(5, 4), rep(6, 2), rep(7, 3))
prob_to_find <- 5

probability <- sum(values_all == prob_to_find) / length(values_all)

The probability is the number of times the value occurs (or values_all == prob_to_find) divided by the total number of values in your set.

For question 2 I commented on your question, because I need some extra info

Upvotes: 2

Computing probabilities in R

Answers (2)

Related Questions