SHRram
SHRram

Reputation: 4227

loop for working with individual values in r

Here is my small dataset.

Indvidual <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
Parent1 <- c(NA, NA, "A", "A", "C", "C", "C", "E", "A", NA)
Parent2 <- c(NA, NA, "B", "C", "D", "D", "D", NA, "D", NA)
mydf <- data.frame (Indvidual, Parent1, Parent2)

  Indvidual Parent1 Parent2
1         A    <NA>    <NA>
2         B    <NA>    <NA>
3         C       A       B
4         D       A       C
5         E       C       D
6         F       C       D
7         G       C       D
8         H       E    <NA>
9         I       A       D
10        J      <NA>     <NA>

Just consider people who has two or one known parents. I need to compare and derieve score by calculating scores that their parents have.

The rules is that either one of parent (names in parent1 or parent2 column) is known (not NA), will get 1 one additional score plus score their parents have. If there are two parents known, the highest scorer will be taken into consideration.

Here is an example:

Individual "A", has both parents unknown so will get score 0
Indiviudal "C", has both parents known (i.e. A, B) 
will get 0 score (maximum of their parents) 

plus 1 (as it has either one of parents known)

Thus expected output from above dataframe (with explanation) is:

Indvidual Parent1 Parent2   Scores     Explanation 
1         A    <NA>    <NA>    0       0 (Max of parent Scores NA) + 0 (neither parent knwon) 
2         B    <NA>    <NA>    0       0 (Max of parent Scores NA)  + 0 (neither parent knwon) 
3         C     A       B      1    0 (Max of parent Scores)  +  1 (either parent knwon)       
4         D     A        C      2       1 (Max of parent scores)  +  1 (either parent knwon) 
5         E       C      D      3       2 (Max of parent scores) + 1 (either parent knwon)
6         F       C      D      3       2 (Max of parent scores) + 1 (either parent knwon)
7         G       C      D      3       2 (Max of parent scores) + 1 (either parent knwon)
8         H       E    <NA>     4       3 (Max of parent scores) + 1 (either parent knwon) 
9         I       A       D     3       2 (Max of parent scores) + 1 (either parent knwon)
10        J      <NA>    <NA>   0       0 (Max of parent scores NA)  + 0 (neither parent knwon)

Explanation: As loop goes on, it takes into account on the Scores already calculated. Max of parent scores

Edits: based on chase's question

For example:

Individual C has two parents A and B, each of which has Scores calculated as 0 and 0 
(in row 1 and 2 and column Scores),  means that max (c(0,0)) will be 0

Individual E has parents C and D, whose scores in Scores column is (in row 3 and 4),
 1 and 2, respectively.  So maximum of max(c(1,2)) will be 2.

Upvotes: 2

Views: 164

Answers (2)

Thierry
Thierry

Reputation: 18487

Individual <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
Parent1 <- c(NA, NA, "A", "A", "C", "C", "C", "E", "A", NA)
Parent2 <- c(NA, NA, "B", "C", "D", "D", "D", NA, "D", NA)
mydf <- data.frame (Individual, Parent1, Parent2, stringsAsFactors = FALSE)

mydf$Scores <- NA
mydf$Scores[rowSums(is.na(mydf[, c("Parent1", "Parent2")])) == 2] <- 0
while(any(is.na(mydf$Scores))){
  KnownScores <- mydf[!is.na(mydf$Scores), c(1, 4)]
  ToCalculate <- mydf[
    mydf$Parent1 %in% c(KnownScores$Individual, NA) & 
    mydf$Parent2 %in% c(KnownScores$Individual, NA) & 
    is.na(mydf$Scores), 
    -4]
  ToCalculate$Score <- apply(
    merge(
      merge(
        ToCalculate, 
        KnownScores, 
        by.x = "Parent1", 
        by.y = "Individual", 
        all.x = TRUE
      ), 
      KnownScores, 
      by.x = "Parent2",
      by.y = "Individual",
      all.x = TRUE
    )[, 4:5], 
    1, 
    max, 
    na.rm = TRUE) + 1
  mydf <- merge(mydf, ToCalculate[, c(1, 4)], all.x = TRUE)
  mydf$Scores[!is.na(mydf$Score)] <- mydf$Score[!is.na(mydf$Score)]
  mydf$Score <- NULL
}

Upvotes: 1

shhhhimhuntingrabbits
shhhhimhuntingrabbits

Reputation: 7475

Example using plyr and a recursive argument

library(plyr)
Indvidual <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
Parent1 <- c(NA, NA, "A", "A", "C", "C", "C", "E", "A", NA)
Parent2 <- c(NA, NA, "B", "C", "D", "D", "D", NA, "D", NA)
mydf <- data.frame (Indvidual, Parent1, Parent2)
scor.fun<-function(x,mydf){
    Explanation<-0
    P1<-as.character(x$Parent1)
    P2<-as.character(x$Parent2)
    score<-as.numeric(!(is.na(P1)||is.na(P1)))
    if(!(is.na(P1)||is.na(P2))){
        Explanation<-max(scor.fun(subset(mydf,Indvidual==P1),mydf)[1],scor.fun(subset(mydf,Indvidual==P2),mydf)[1])
        score<-score+Explanation
    }else{
        Explanation<-ifelse(is.na(P1),0,scor.fun(subset(mydf,Indvidual==P1),mydf)[1])
        Explanation<-max(Explanation,ifelse(is.na(P2),0,scor.fun(subset(mydf,Indvidual==P2),mydf)[1]))
        score<-score+Explanation
    }
    c(score,Explanation)
}

adply(mydf,1,scor.fun,mydf)

Probably not the best idea with the recursion on a big dataframe.

Upvotes: 2

Related Questions