user2230555
user2230555

Reputation: 445

R Generating a new variable based on conditional statement applied to many columns

There is probably an obvious and elegant way to do this, probably using lapply, but I am still mastering apply commands and am struggling to find it.

I have a dataframe that looks like the following except that instead of 5 factor variables there are dozens and instead of 10 rows there are hundreds.

    a<- data.frame("id" = c(1:10),
                   "a1" = factor(c(0,0,1,1,0,1,0,1,0,1)),
                   "a2" = factor(c(0,0,0,0,0,0,0,0,1,0)), 
                   "a3" = factor(c(0,0,0,0,0,1,0,0,0,0)),
                   "a4" = factor(c(0,0,0,0,0,0,0,0,1,1)), 
                   "a5" = factor(c(0,0,0,1,0,0,0,0,0,0)))

I want to create a new variable which is 1 if any of 13 columns contain a particular level of the factor. The equivalent in the example dataframe would be creating a new variable called "b" which is 1 is there's a "1" in any of the columns a1:a4, which would look like the following.

    a<- data.frame("id" = c(1:10),
                   "a1" = factor(c(0,0,1,1,0,1,0,1,0,1)),
                   "a2" = factor(c(0,0,0,0,0,0,0,0,1,0)), 
                   "a3" = factor(c(0,0,0,0,0,1,0,0,0,0)),
                   "a4" = factor(c(0,0,0,0,0,0,0,0,1,1)), 
                   "a5" = factor(c(0,0,0,1,0,0,0,0,0,0)), 
                   "b"  = c(0,0,1,1,0,1,0,1,1,1))

There has GOT to be a way to do this using the 13 column positions instead of writing a conditional ifthen statement for each of the 13 variables.

Upvotes: 0

Views: 200

Answers (3)

Rich Scriven
Rich Scriven

Reputation: 99321

You could also use any after converting the matrix to logical.

> apply(a[grep("a[1-4]", names(a))] == 1, 1, any)+0
# [1] 0 0 1 1 0 1 0 1 1 1

Or

> apply(a[grepl("a[1-4]", names(a))] == 1, 1, any)+0
# [1] 0 0 1 1 0 1 0 1 1 1

Upvotes: 0

akrun
akrun

Reputation: 886938

In case you wanted to try lapply

  Reduce(`|`,lapply(a[,-1], function(x) as.numeric(as.character(x))))+0
  #[1] 0 0 1 1 0 1 0 1 1 1

Or just

  Reduce(`|`, lapply(a[,-1], `==`, 1)) +0
  #[1] 0 0 1 1 0 1 0 1 1 1

Benchmarks

set.seed(155)
df <- as.data.frame(matrix(sample(0:1, 5000*1e4, replace=TRUE), ncol=5000))

library(microbenchmark)
f1 <- function() {as.numeric(rowSums(df == 1) >= 1) }
f2 <- function() {Reduce(`|`, lapply(df, `==`, 1)) +0}
f3 <- function() {apply(df == 1, 1, function(x) any(x %in% TRUE))+0}

microbenchmark(f1(), f2(), f3(), unit="relative")
# Unit: relative
# expr       min       lq   median       uq      max neval
# f1() 1.000000 1.000000 1.000000 1.000000 1.000000   100
# f2() 1.040561 1.043713 1.053773 1.032932 1.045067   100
# f3() 2.538287 2.517184 2.825253 2.477225 2.454511   100

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

Just use rowSums, something like this:

> as.numeric(rowSums(a[paste0("a", 1:5)] == 1) >= 1)
 [1] 0 0 1 1 0 1 0 1 1 1

Upvotes: 4

Related Questions