Reputation: 417
Newbie: I have a data table with 3 columns of categorical values, and I would like to add a fourth column with values calculated by row based on the values of the first 3 columns. So far I have:
tC <- textConnection("Visit1 Visit2 Visit3
yes no no
yes no yes
yes yes yes")
data1 <- read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)
data1["pattern"] <- NA
Next I would like to fill in column 4 such that if the values of visit1, visit2 and visit3 are for example, "yes", "no" and "no", NA would be replaced by "1" in the pattern column for that row. In other languages this would be a FOR loop with some IF statements. I have looked at the apply family, but still not quite sure about best approach and syntax for this in R. Thoughts appreciated.
Upvotes: 0
Views: 1411
Reputation: 193507
Assuming we are using @SimonO101's expanded sample data, I would suggest expand.grid
and factor
.
First, create all the combinations that we are going to have of "yes" and "no" for three columns.
facLevs <- expand.grid(c("yes", "no"), c("yes", "no"), c("yes", "no"))
facLevs
# Var1 Var2 Var3
# 1 yes yes yes
# 2 no yes yes
# 3 yes no yes
# 4 no no yes
# 5 yes yes no
# 6 no yes no
# 7 yes no no
# 8 no no no
Now, we will factor the combinations of the columns. We can use do.call(paste, ...)
to do this more easily than apply(mydf, ...)
. We will convert that to as.numeric
to get the numeric group.
mydf$pattern <- as.numeric(factor(do.call(paste, mydf[1:3]),
do.call(paste, facLevs)))
mydf
# Visit1 Visit2 Visit3 pattern
# 1 yes no no 7
# 2 yes no yes 3
# 3 yes yes yes 1
# 4 no yes no 6
# 5 yes no yes 3
As you can see, pattern = 7
corresponds to the values we would find on the seventh row of the facLevs
data.frame
that we created.
For convenience, here is mydf
:
mydf <- structure(list(Visit1 = c("yes", "yes", "yes", "no", "yes"),
Visit2 = c("no", "no", "yes", "yes", "no"),
Visit3 = c("no", "yes", "yes", "no", "yes")),
.Names = c("Visit1", "Visit2", "Visit3"),
class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
Upvotes: 2
Reputation: 59970
I'm not sure this is the most efficient way to go about this, but we can find the unique rows and then find for each row in the data.frame which of the unique rows it matches. This number is therefore the pattern ID. We have to collapse rows into single string elements though, otherwise R vectorisation gets in the way of what we want. The example below uses the slightly expanded example data:
# Visit1 Visit2 Visit3
#1 yes no no
#2 yes no yes
#3 yes yes yes
#4 no yes no
#5 yes no yes
# Get unique combinations
pats <- unique( data1 )
# Colapse each row to a single string element
pats <- apply( pats , 1 , paste , collapse = " " )
#do the same to your data and compare with the patterns
data1$pattern <- apply( data1 , 1 , function(x) match( paste( x , collapse = " " ) , pats ) )
# Visit1 Visit2 Visit3 pattern
#1 yes no no 1
#2 yes no yes 2
#3 yes yes yes 3
#4 no yes no 4
#5 yes no yes 2
Upvotes: 3
Reputation: 647
Updated
Answer with for cycle:
updateRow <- function(rIndex, data1) {
if ((data1[rIndex, 1] == "yes") &&
(data1[rIndex, 2] == "no") &&
(data1[rIndex, 3] == "no")) {
data1[rIndex, 4] <- 1
}
}
for (i in c(1:3)) updateRow(i, data1); # dim(data1)[2]-1 the column number if you need to change it.
You can just change the if as you want. I hope this is what you want.
Upvotes: 0