Reputation: 1906
I have two questions, that I'd like to use R to solve.
I have a vector of values which distribution is unknown.
my test data is as follows:
values_all <- c(rep(1, 3), rep(2, 5), rep(3, 2), 4, rep(5, 4), rep(6, 2), rep(7, 3))
prob_to_find <- 5
Grateful for any assistance.
Upvotes: 0
Views: 5602
Reputation: 593
To calculate the probability of a value from the unknown distribution you can basically compute the probabilities of the values:
prop.table(table(values_all))
values_all
which outputs:
1 2 3 4 5 6 7
0.15 0.25 0.10 0.05 0.20 0.10 0.15
Or, you need to assume a distribution after inspecting your vector, e.g. a uniform(1,7)
would be:
> punif(3, min = 1, max = 7)
[1] 0.3333333
On this decision process refer to this StackExchange answer. Also, note that with continuous distributions you should compute the difference between two double (numeric) values since the probability of a specific value would be zero by definition.
To avoid discretionary decisions, running simulations is often a safer choice. You can just sample with replacement:
b <- vector("numeric", 1000)
set.seed(1234)
for (i in 1:1000){
b[i] <- sample(values_all, size=1, replace = T)
}
prop.table(table(b))
Which returns:
b
1 2 3 4 5 6 7
0.144 0.251 0.087 0.053 0.207 0.099 0.159
I.e.: a probability of value 3=8.7%.
Upvotes: 3
Reputation: 947
For question 1 you can use this:
values_all <- c(rep(1, 3), rep(2, 5), rep(3, 2), 4, rep(5, 4), rep(6, 2), rep(7, 3))
prob_to_find <- 5
probability <- sum(values_all == prob_to_find) / length(values_all)
The probability is the number of times the value occurs (or values_all == prob_to_find
) divided by the total number of values in your set.
For question 2 I commented on your question, because I need some extra info
Upvotes: 2