Eva
Eva

Reputation: 339

How to apply ifelse function by column names?

I know there are many similar questions around but I'm afraid couldn't get my head around this particular one, though obviously it is very simple!

I am trying to write a simple ifelse function to be applied over a series of columns in a data frame by using column names (rather than numbers). What I try to do is to create a single u_all variable as shown below without typing column names repeatedly.

dat <- data.frame(id=c(1:20),u1 = sample(c(0:1),20,replace=T) , u2 = sample(c(0:1),20,replace=T) , u3 = sample(c(0:1),20,replace=T)) 
dat<-within(dat,u_all<-ifelse (u1==1 | u2==1 |u3==1,1,0))
dat

I tried many variants of apply but clearly I'm not on the right track as those grouping functions replicate the ifelse function on each column separately.

dat2 <- data.frame(id=c(1:20),u1 = sample(c(0:1),20,replace=T) , u2 = sample(c(0:1),20,replace=T) , u3 = sample(c(0:1),20,replace=T)) 

dat2<-cbind(dat2,sapply(dat2[,grepl("^u\\d{1,}",colnames(dat2))],
                               function(x){ u_all<-ifelse(x==1 & !is.na(x),1,0)}))

dat2

Upvotes: 1

Views: 1994

Answers (2)

Heroka
Heroka

Reputation: 13139

You were almost there, here's a solution using apply over rows and using all to transform a vector of tests to a single digit.

dat2$u_all <- apply(dat2[,-1], MARGIN=1, FUN=function(x){ 
  any(x==1)&all(!is.na(x))*1
}
)

Upvotes: 3

Frank
Frank

Reputation: 66819

This line from the OP

dat<-within(dat,u_all<-ifelse (u1==1 | u2==1 |u3==1,1,0))

can instead be written as

dat$u_all <- +Reduce("|", dat[, c("u1", "u2", "u3")])

How it works, in terms of intermediate objects:

  • D = dat[, c("u1", "u2", "u3")] uses the names of the columns to subset the data frame.
  • r = Reduce("|", D) collapses the data by putting | between each pair of columns. The result is a logical (TRUE/FALSE) vector.
  • To convert r to a 0/1 integer vector, you could use ifelse(r,1L,0L) or as.integer(r) (since TRUE/FALSE converts to 1/0 by default) or just the unary +, like +r.

If you want to avoid using column names (it's really not clear to me from the post), you can construct D = dat[-1] to exclude the first column instead.

Upvotes: 5

Related Questions