Reputation: 665
I am using the prob
package in R to calculate Conditional probability.
My data set is
Q1 Q2 Q3 Q4
1 1 0 0
0 0 0 0
0 1 0 1
0 1 0 1
I want to calculate prob(Q2 =1 given Q4=1), as per my knowledge it should be 1. But when I use following command in R
Prob(a,Q2==1,Q4==1)
it return 0.5
How come it return 0.5? Is 0.5, right? I am doubting my answer.
The second question is If I change the data set to
Q1 Q2 Q3 Q4
1 1 0 0
1 0 1 0
0 1 0 1
1 1 1 1
When I use the above data and calculate above probability it returns 1.
How come probability changes when I am not changing Q2 and Q4.
My thinking is it should be same 1 in both cases.
How come it changes just by the change in other parameter Q1 and Q3. I think it should change as P(Q2=1 / Q4=1) is independent of Q1 and Q3.
Upvotes: 1
Views: 1925
Reputation: 16277
The problem is that Prob
uses intersect
which excludes duplicates. So the calculation it does is sum(intersect(A, B)$probs)/sum(B$probs)
which is 0.25/0.5=0.5.
If you want the correct calculation, you have to use exclusive probabilities like so (the 3rd line has a probability of 50%):
a <-read.table(text="Q1 Q2 Q3 Q4
1 1 0 0
0 0 0 0
0 1 0 1",header=TRUE,stringsAsFactors=FALSE)
a$probs <-c(0.25,0.25,0.5)
Prob(a,event=Q2==1,given=Q4==1)
[1] 1
As for your second question, Prob
is working correctly because intersect
is not removing duplicates because line 3 and 4 are different.
Upvotes: 2