Reputation: 409

Conditional calculation based on data in other columns in R

Newbie: I have a data table with 3 columns of categorical values, and I would like to add a fourth column with values calculated by row based on the values of the first 3 columns. So far I have:

tC <- textConnection("Visit1    Visit2  Visit3
yes no  no
yes no  yes
yes yes yes")
data1 <- read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)
data1["pattern"] <- NA

Next I would like to fill in column 4 such that if the values of visit1, visit2 and visit3 are for example, "yes", "no" and "no", NA would be replaced by "1" in the pattern column for that row. In other languages this would be a FOR loop with some IF statements. I have looked at the apply family, but still not quite sure about best approach and syntax for this in R. Thoughts appreciated.

Upvotes: 0

Answers (3)

A5C1D2H2I1M1N2O1R2T1

Reputation: 193677

Assuming we are using @SimonO101's expanded sample data, I would suggest expand.grid and factor.

First, create all the combinations that we are going to have of "yes" and "no" for three columns.

facLevs <- expand.grid(c("yes", "no"), c("yes", "no"), c("yes", "no"))
facLevs
#   Var1 Var2 Var3
# 1  yes  yes  yes
# 2   no  yes  yes
# 3  yes   no  yes
# 4   no   no  yes
# 5  yes  yes   no
# 6   no  yes   no
# 7  yes   no   no
# 8   no   no   no

Now, we will factor the combinations of the columns. We can use do.call(paste, ...) to do this more easily than apply(mydf, ...). We will convert that to as.numeric to get the numeric group.

mydf$pattern <- as.numeric(factor(do.call(paste, mydf[1:3]), 
                                  do.call(paste, facLevs)))
mydf
#   Visit1 Visit2 Visit3 pattern
# 1    yes     no     no       7
# 2    yes     no    yes       3
# 3    yes    yes    yes       1
# 4     no    yes     no       6
# 5    yes     no    yes       3

As you can see, pattern = 7 corresponds to the values we would find on the seventh row of the facLevs data.frame that we created.

For convenience, here is mydf:

mydf <- structure(list(Visit1 = c("yes", "yes", "yes", "no", "yes"), 
                       Visit2 = c("no", "no", "yes", "yes", "no"), 
                       Visit3 = c("no", "yes", "yes", "no", "yes")), 
                  .Names = c("Visit1", "Visit2", "Visit3"), 
                  class = "data.frame", row.names = c("1", "2", "3", "4", "5"))

Upvotes: 2

Simon O'Hanlon

Reputation: 60000

I'm not sure this is the most efficient way to go about this, but we can find the unique rows and then find for each row in the data.frame which of the unique rows it matches. This number is therefore the pattern ID. We have to collapse rows into single string elements though, otherwise R vectorisation gets in the way of what we want. The example below uses the slightly expanded example data:

#  Visit1 Visit2 Visit3
#1    yes     no     no
#2    yes     no    yes
#3    yes    yes    yes
#4     no    yes     no
#5    yes     no    yes

#  Get unique combinations
pats <- unique( data1 )

#  Colapse each row to a single string element
pats <- apply( pats , 1 , paste , collapse = " " )

#do the same to your data and compare with the patterns
data1$pattern <- apply( data1 , 1 , function(x) match( paste( x , collapse = " " ) , pats ) )
#  Visit1 Visit2 Visit3 pattern
#1    yes     no     no       1
#2    yes     no    yes       2
#3    yes    yes    yes       3
#4     no    yes     no       4
#5    yes     no    yes       2

Upvotes: 3

alap

Reputation: 647

Updated

Answer with for cycle:

updateRow <- function(rIndex, data1) { 
  if ((data1[rIndex, 1] == "yes") && 
      (data1[rIndex, 2] == "no") && 
      (data1[rIndex, 3] == "no")) { 
        data1[rIndex, 4] <- 1
  }   
}

for (i in c(1:3)) updateRow(i, data1); # dim(data1)[2]-1 the column number if you need to change it.

You can just change the if as you want. I hope this is what you want.

Upvotes: 0

Conditional calculation based on data in other columns in R

Answers (3)

Related Questions