Converting a function of nested R for-loops to apply() family functions for efficiency

Question

I have 2 similar data sets.

d1 <- tribble(
  ~individual, ~X1, ~X2, ~X3,
  "p1", "XX", "XY", "YY",
  "p2", "XY", "XY", "YY",
  "p3", "YY", "XX", "XX"
)

d2 <- tribble(
  ~individual, ~X1, ~X2, ~X3,
  "p1", "XX", "XY", "YY",
  "p2", "XY", "XY", "YY",
  "p3", "YY", "XX", "XX",
  "p4", "YY", "XX", "XX",
  "p5", "YY", "XX", "XX"
)

I made a function to compare d1 to d2. The comparison takes each indavidual in d1 and compares ir to every indavidual in d2 by corrasponding columns. A score is given for each comparison. Then the mean of scores for each individual is reutrned.

scoreData <- function(d1, d2) {
 require(tidyverse)
 output <- data.frame() %>%
  mutate("name1", "name2", "meanScore")
    
 colNames <- names(d1)[-1]
    
 for(i in 1:nrow(d1)){
  name1 <- NULL
  name1 <- d1$individual[i]
  for(j in 1:nrow(d2)){
   name2 <- NULL
   name2 <- d2$individual[j]
   scores <- NULL
   for(k in 1:length(colName)){
    col <- NULL
    col <- colNames[k]
    score = case_when(
     d1[i,col] == "XX" && d2[j,col] == "XX" ~ 1.0,
     d1[i,col] == "XX" && d2[j,col] == "XY" ~ 0.5,
     d1[i,col] == "XX" && d2[j,col] == "YY" ~ 0.0,
     d1[i,col] == "YY" && d2[j,col] == "XX" ~ 0.0,
     d1[i,col] == "YY" && d2[j,col] == "XY" ~ 0.5,
     d1[i,col] == "YY" && d2[j,col] == "YY" ~ 1.0,
     d1[i,col] == "XY" && d2[j,col] == "XX" ~ 0.5,
     d1[i,col] == "XY" && d2[j,col] == "XY" ~ 0.5,
     d1[i,col] == "XY" && d2[j,col] == "YY" ~ 0.5
     )
    scores <- append(scores, score)
    k = k + 1
    }
   meanScore <- mean(scores, na.rm = TRUE)
   output <- rbind(output, cbind(name1, name2, meanScore))
   j = j + 1
   }
  i = i + 1
  }
 return(output)
 }

The problem is my real datasets are very large and I need to make my code more efficent. I know that the family of apply() functions are more efficent than using for loops in R. But, I am not sure how to use them to replicate this nested forloop. eventually, I would like to parellelize the apply functions to make this scoring function more efficient. Any ideas or help would be geatly appriciated.

Converting a function of nested R for-loops to apply() family functions for efficiency

Answers (1)

Related Questions