Reputation: 4227
Here is my small dataset.
Indvidual <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
Parent1 <- c(NA, NA, "A", "A", "C", "C", "C", "E", "A", NA)
Parent2 <- c(NA, NA, "B", "C", "D", "D", "D", NA, "D", NA)
mydf <- data.frame (Indvidual, Parent1, Parent2)
Indvidual Parent1 Parent2
1 A <NA> <NA>
2 B <NA> <NA>
3 C A B
4 D A C
5 E C D
6 F C D
7 G C D
8 H E <NA>
9 I A D
10 J <NA> <NA>
Just consider people who has two or one known parents. I need to compare and derieve score by calculating scores that their parents have.
The rules is that either one of parent (names in parent1 or parent2 column) is known (not NA), will get 1 one additional score plus score their parents have. If there are two parents known, the highest scorer will be taken into consideration.
Here is an example:
Individual "A", has both parents unknown so will get score 0
Indiviudal "C", has both parents known (i.e. A, B)
will get 0 score (maximum of their parents)
plus 1 (as it has either one of parents known)
Thus expected output from above dataframe (with explanation) is:
Indvidual Parent1 Parent2 Scores Explanation
1 A <NA> <NA> 0 0 (Max of parent Scores NA) + 0 (neither parent knwon)
2 B <NA> <NA> 0 0 (Max of parent Scores NA) + 0 (neither parent knwon)
3 C A B 1 0 (Max of parent Scores) + 1 (either parent knwon)
4 D A C 2 1 (Max of parent scores) + 1 (either parent knwon)
5 E C D 3 2 (Max of parent scores) + 1 (either parent knwon)
6 F C D 3 2 (Max of parent scores) + 1 (either parent knwon)
7 G C D 3 2 (Max of parent scores) + 1 (either parent knwon)
8 H E <NA> 4 3 (Max of parent scores) + 1 (either parent knwon)
9 I A D 3 2 (Max of parent scores) + 1 (either parent knwon)
10 J <NA> <NA> 0 0 (Max of parent scores NA) + 0 (neither parent knwon)
Explanation: As loop goes on, it takes into account on the Scores already calculated. Max of parent scores
Edits: based on chase's question
For example:
Individual C has two parents A and B, each of which has Scores calculated as 0 and 0
(in row 1 and 2 and column Scores), means that max (c(0,0)) will be 0
Individual E has parents C and D, whose scores in Scores column is (in row 3 and 4),
1 and 2, respectively. So maximum of max(c(1,2)) will be 2.
Upvotes: 2
Views: 164
Reputation: 18487
Individual <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
Parent1 <- c(NA, NA, "A", "A", "C", "C", "C", "E", "A", NA)
Parent2 <- c(NA, NA, "B", "C", "D", "D", "D", NA, "D", NA)
mydf <- data.frame (Individual, Parent1, Parent2, stringsAsFactors = FALSE)
mydf$Scores <- NA
mydf$Scores[rowSums(is.na(mydf[, c("Parent1", "Parent2")])) == 2] <- 0
while(any(is.na(mydf$Scores))){
KnownScores <- mydf[!is.na(mydf$Scores), c(1, 4)]
ToCalculate <- mydf[
mydf$Parent1 %in% c(KnownScores$Individual, NA) &
mydf$Parent2 %in% c(KnownScores$Individual, NA) &
is.na(mydf$Scores),
-4]
ToCalculate$Score <- apply(
merge(
merge(
ToCalculate,
KnownScores,
by.x = "Parent1",
by.y = "Individual",
all.x = TRUE
),
KnownScores,
by.x = "Parent2",
by.y = "Individual",
all.x = TRUE
)[, 4:5],
1,
max,
na.rm = TRUE) + 1
mydf <- merge(mydf, ToCalculate[, c(1, 4)], all.x = TRUE)
mydf$Scores[!is.na(mydf$Score)] <- mydf$Score[!is.na(mydf$Score)]
mydf$Score <- NULL
}
Upvotes: 1
Reputation: 7475
Example using plyr
and a recursive argument
library(plyr)
Indvidual <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
Parent1 <- c(NA, NA, "A", "A", "C", "C", "C", "E", "A", NA)
Parent2 <- c(NA, NA, "B", "C", "D", "D", "D", NA, "D", NA)
mydf <- data.frame (Indvidual, Parent1, Parent2)
scor.fun<-function(x,mydf){
Explanation<-0
P1<-as.character(x$Parent1)
P2<-as.character(x$Parent2)
score<-as.numeric(!(is.na(P1)||is.na(P1)))
if(!(is.na(P1)||is.na(P2))){
Explanation<-max(scor.fun(subset(mydf,Indvidual==P1),mydf)[1],scor.fun(subset(mydf,Indvidual==P2),mydf)[1])
score<-score+Explanation
}else{
Explanation<-ifelse(is.na(P1),0,scor.fun(subset(mydf,Indvidual==P1),mydf)[1])
Explanation<-max(Explanation,ifelse(is.na(P2),0,scor.fun(subset(mydf,Indvidual==P2),mydf)[1]))
score<-score+Explanation
}
c(score,Explanation)
}
adply(mydf,1,scor.fun,mydf)
Probably not the best idea with the recursion on a big dataframe.
Upvotes: 2