Reputation: 231
I want to calculate condition mutual information in R and I used package called infotheo.
I used two ways to calculate I(X;Y1,Y2|Z). First is to use the following code,
condinformation(X$industry,cbind(X$ethnicity,X$education),S=X$gender, method="emp")
[1] -1.523344
And as I think the mutual information can be decomposed to two entropies: I(X;Y1,Y2|Z)=H(X|Z)-H(X|Z,Y1,Y2), I used the following codes,
hhh<-condentropy(X$industry, Y=X$gender, method="emp")
hhh1<-condentropy(X$industry,Y=cbind(X$gender,X$ethnicity,X$education))
hhh-hhh1
[1] 0.1483363
I am wondering why these two gave me different results?
Upvotes: 0
Views: 1545
Reputation: 9506
The two methods are different estimators and thus give different results, just like the following two estimators for the variance of the sum of random variables a and b give different results:
> a <- rnorm(100)
> b <- rnorm(100)
> var(a+b)-(var(a)+var(b))
[1] 0.5219229
Not sure which estimator is better in your case, but I would guess the first one. You could do some simulations from your model to get an better idea.
Upvotes: 2