generating matrix by applying function to all possible combination of variables in r

Question

Here is my small dataset and here is a function:

dat <- data.frame (
 A1 = c("AA", "AA", "AA", "AA"),
 B1 = c("BB", "BB", "AB", "AB"), 
 C1 = c("AB", "BB", "AA", "AB"))

The function

syfun <- function (x, y){

if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
        sxy = 1
}
if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
    sxy = 0.5
}
if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
    sxy = 0
}
return(sxy)
}

out <- rep (NA, NROW(dat))

for (i in 1:NROW(dat)){
out[i] <- syfun (dat[i,1], dat[i,1])
}

mean(out)
1

Here what I am trying to do is apply the function with first column (variable A) with same variable (variable A1) and average the output value. I want to save this output to a cell of matrix.

Similarly between A1 and B1.

   for (i in 1:NROW(dat)){
    out[i] <- syfun (dat[i,1], dat[i,2])
    }
    mean(out)
    0.25

Now similar to correlation matrix, I want to save all possible combination between variable to make a matrix like.

         A1    B1    C1
A1       1.0  0.25  0.5
B1       0.25  1.0  NA
C1       0.5   NA   1.0

Edits: More complete function that do not produce NAs

syfun <- function (x, y){
  sxy <- NA
  if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
        sxy = 1
  }
  if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
        sxy = 0.5
  }
  if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
        sxy = 0
  }
  if (x == "BB" & y == "AB"| x == "AB" & y == "BB"){
        sxy = 0.5
  }

  if(x == "AB" & y ==  "AB") {
    sxy = 0.5
    }
  return(sxy)
}

Sven Hohenstein · Accepted Answer

First, your function syfun has to return NA if there is no match. Hence, I added a line at the top of the function:

syfun <- function (x, y){
  sxy <- NA
  if(x == "AA" & y == "AA" | x == "BB" & y == "AA"){
        sxy = 1
  }
  if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
        sxy = 0.5
  }
  if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
        sxy = 0
  }
  return(sxy)
}

Second, you can use outer to apply the function to all combinations. You need to use Vectorize to vectorize the function:

mat <- outer(names(dat), names(dat), function(x, y) 
  Vectorize(function(a, b) mean(Vectorize(syfun)(dat[[a]], dat[[b]])))(x,y))

Third, replace the elements on the diagonal with 1:

diag(mat) <- 1

Fourth, set row and column names:

dimnames(mat) <- list(names(dat), names(dat))

The result:

     A1   B1  C1
A1 1.00 0.25 0.5
B1 0.25 1.00  NA
C1 0.50   NA 1.0

generating matrix by applying function to all possible combination of variables in r

Answers (2)

Related Questions