Kristina
Kristina

Reputation: 29

Generating new variable in R based on group properties

I need to do generate a new variable called Result in R, such that:

based on Variable.ID if all Classification per Variable.ID are equal to "yes", Result="yes" and if all Classification per Variable.ID are equal to "no", Result="no" else Result="undetermined"

enter image description here

Can anyone advise me how can I do this? (There are hundreds of Variable.IDs, so no manual vector assignments, please.)

Upvotes: 2

Views: 97

Answers (3)

mkearney
mkearney

Reputation: 1345

foo <- function(x) {
  if (sum(x == "yes") == length(x)) {
    return("yes")
  } else if (sum(x == "no") == length(x)) {
    return("no")
  } else {
    return("undetermined")
  }
}

for (i in seq_along(data) {
  data$Result[i] <- foo(data$Classification[data$Variable.ID == data$Variable.ID[i])
}

Upvotes: 0

geekoverdose
geekoverdose

Reputation: 1007

You can split Classification by Variable.ID and check for all values being either yes or no:

library(plyr)
results <- llply(split(d, d$Variable.ID), function(d2) {
if(all(d2$Classification=='yes')) {
    'yes'
} else if(all(d2$Classification=='no')) {
    'no'
} else {
    'undetermined'
}
})
d$Results <- factor(unlist(results[d$Variable.ID]))

...which should give you what you asked for:

> print(d)

   Variable.ID Classification      Results
1            1            yes          yes
2            1            yes          yes
3            1            yes          yes
4            1            yes          yes
5            1            yes          yes
6            2             no           no
7            2             no           no
8            2             no           no
9            2             no           no
10           3            yes undetermined
11           3             no undetermined
12           4           both undetermined
13           4           <NA> undetermined
14           4            yes undetermined

Upvotes: 3

Bernhard
Bernhard

Reputation: 4427

This can be done with ave(), any(), all() etc. But the question is not good for cross validated. The following is a starter for you. You will have to change "NA" to "undeterminded" but I tried to keep the code as easy to grasp as possible:

d <- data.frame(v.id=c(1,1,1,2,2,2,3,3,3),
           clas=c("yes", "yes", "yes", "yes", "yes",
                  "no","no","no", "no"))

d$result <- ave(d$clas, d$v.id, 
            FUN=function(x) {
              if(all(x=="yes")){ return("yes") }
              if(all(x=="no")) { return("no") }
              else return(NA)
            })

Upvotes: 3

Related Questions