user395882
user395882

Reputation: 665

prob package seems to miscalculate Conditional Probability?

I am using the prob package in R to calculate Conditional probability.

My data set is

 Q1 Q2 Q3 Q4 

  1  1  0  0  
  0  0  0  0  
  0  1  0  1  
  0  1  0  1  

I want to calculate prob(Q2 =1 given Q4=1), as per my knowledge it should be 1. But when I use following command in R

Prob(a,Q2==1,Q4==1) it return 0.5

How come it return 0.5? Is 0.5, right? I am doubting my answer.

The second question is If I change the data set to

  Q1 Q2 Q3 Q4 
  1  1  0  0 
  1  0  1  0 
  0  1  0  1 
  1  1  1  1 

When I use the above data and calculate above probability it returns 1. How come probability changes when I am not changing Q2 and Q4.
My thinking is it should be same 1 in both cases.

How come it changes just by the change in other parameter Q1 and Q3. I think it should change as P(Q2=1 / Q4=1) is independent of Q1 and Q3.

Upvotes: 1

Views: 1925

Answers (1)

Pierre Lapointe
Pierre Lapointe

Reputation: 16277

The problem is that Prob uses intersect which excludes duplicates. So the calculation it does is sum(intersect(A, B)$probs)/sum(B$probs) which is 0.25/0.5=0.5.

If you want the correct calculation, you have to use exclusive probabilities like so (the 3rd line has a probability of 50%):

a <-read.table(text="Q1 Q2 Q3 Q4
  1  1  0  0
  0  0  0  0
  0  1  0  1",header=TRUE,stringsAsFactors=FALSE)
a$probs <-c(0.25,0.25,0.5)

Prob(a,event=Q2==1,given=Q4==1)
[1] 1

As for your second question, Prob is working correctly because intersect is not removing duplicates because line 3 and 4 are different.

Upvotes: 2

Related Questions