Average distance of each variable

Question

The average distance of each variable may be calculated using the formula given below. Here d represents the average distance of the variable of interest with its parent variables. p and q stand for the conditional probabilities of this variable for the different states of its parents, i stands for the different states of the child node and n stands for the number of states of the set of parent nodes.

Here is an example with two state parents. What I am trying to calculate is:

   Average {[(0.8286-0.6308)^2],[(0.1364-0.2347)^2],...,[(0.0017-0.0049)^2]}
   =0.0107

When I have more than 3 states I need to find:

Average {[(a-b)^2+(a-c)^2+(b-c)^2)],....

I tried:

      x1<-c(0.8286,0.1364,0.0300,0.0033,0.0017)

      x2<-c(0.6308,0.2347,0.0807,0.0489,0.0049)

      dist(rbind(x1,x2))

But it just give me the Euclidean distance.

Zheyuan Li · Accepted Answer

Sorry at first I had a misunderstanding. Now this is what you really can do:

d <- function(mat) {
  ind <- as.numeric(combn(nrow(mat), 2))
  n <- length(ind) / 2
  mean(apply(mat, 2, function(x) {y <- x[ind]; sum((y[seq(from = 1, length = n, by = 2)] - y[seq(from = 2, length = n, by = 2)])^2)}))/n
  }

Example, suppose you have your probability table:

set.seed(0); mat <- matrix(runif(20), 4, 5)

#           [,1]      [,2]       [,3]      [,4]      [,5]
# [1,] 0.8966972 0.9082078 0.66079779 0.1765568 0.4976992
# [2,] 0.2655087 0.2016819 0.62911404 0.6870228 0.7176185
# [3,] 0.3721239 0.8983897 0.06178627 0.3841037 0.9919061
# [4,] 0.5728534 0.9446753 0.20597457 0.7698414 0.3800352

d(mat) # 0.1775407

For your example data of 2 states:

x1<-c(0.8286,0.1364,0.0300,0.0033,0.0017)
x2<-c(0.6308,0.2347,0.0807,0.0489,0.0049)
d(rbind(x1,x2))  # 0.01068956

Average distance of each variable

Answers (2)

Related Questions