Reputation: 29
I need to do generate a new variable called Result in R, such that:
based on Variable.ID if all Classification per Variable.ID are equal to "yes", Result="yes" and if all Classification per Variable.ID are equal to "no", Result="no" else Result="undetermined"
Can anyone advise me how can I do this? (There are hundreds of Variable.IDs, so no manual vector assignments, please.)
Upvotes: 2
Views: 97
Reputation: 1345
foo <- function(x) {
if (sum(x == "yes") == length(x)) {
return("yes")
} else if (sum(x == "no") == length(x)) {
return("no")
} else {
return("undetermined")
}
}
for (i in seq_along(data) {
data$Result[i] <- foo(data$Classification[data$Variable.ID == data$Variable.ID[i])
}
Upvotes: 0
Reputation: 1007
You can split Classification
by Variable.ID
and check for all values being either yes
or no
:
library(plyr)
results <- llply(split(d, d$Variable.ID), function(d2) {
if(all(d2$Classification=='yes')) {
'yes'
} else if(all(d2$Classification=='no')) {
'no'
} else {
'undetermined'
}
})
d$Results <- factor(unlist(results[d$Variable.ID]))
...which should give you what you asked for:
> print(d)
Variable.ID Classification Results
1 1 yes yes
2 1 yes yes
3 1 yes yes
4 1 yes yes
5 1 yes yes
6 2 no no
7 2 no no
8 2 no no
9 2 no no
10 3 yes undetermined
11 3 no undetermined
12 4 both undetermined
13 4 <NA> undetermined
14 4 yes undetermined
Upvotes: 3
Reputation: 4427
This can be done with ave(), any(), all() etc. But the question is not good for cross validated. The following is a starter for you. You will have to change "NA" to "undeterminded" but I tried to keep the code as easy to grasp as possible:
d <- data.frame(v.id=c(1,1,1,2,2,2,3,3,3),
clas=c("yes", "yes", "yes", "yes", "yes",
"no","no","no", "no"))
d$result <- ave(d$clas, d$v.id,
FUN=function(x) {
if(all(x=="yes")){ return("yes") }
if(all(x=="no")) { return("no") }
else return(NA)
})
Upvotes: 3