Using ifelse conditional on multiple columns

Question

I want to generate variables to check whether a specific event occurred over multiple conditions. A sample dataframe is below.

df <- data.frame(
    index = c(1:20),
    con1 = c(1,3,2,4,2,7,5,9,1,2,5,6,1,0,8,0,4,5,7,3),
    con2 = c(3,5,1,6,3,4,7,3,2,1,5,7,9,1,4,2,4,3,4,3),
    con3 = c(2,7,3,4,1,9,4,0,7,0,5,2,7,5,9,3,5,2,1,2))

The actual dataset has 20 conditions[con*] and 10 different event types (each number in the [con*].

What I am doing now is using a tedious command like this;

df %>% mutate (Event1 = ifelse (con1==1 | con2==1 | con3==1,1,0))
df %>% mutate (Event2 = ifelse (con1==2 | con2==2 | con3==2,1,0))
...

It gives exactly what I want to get. However, you can imagine how much mess this makes in the script, with 20 conditions and 10 different events. Do you have any idea how can I make it neat?

r2evans · Accepted Answer

Hack,

library(dplyr)
library(purrr) # map_dfc
events <- setNames(1:4, paste0("Event", 1:4))
df %>%
  bind_cols(map_dfc(events, ~ +(rowSums(df[,-1] == .) > 0))) %>%
  head()
#   index con1 con2 con3 Event1 Event2 Event3 Event4
# 1     1    1    3    2      1      1      1      0
# 2     2    3    5    7      0      0      1      0
# 3     3    2    1    3      1      1      1      0
# 4     4    4    6    4      0      0      0      1
# 5     5    2    3    1      1      1      1      0
# 6     6    7    4    9      0      0      0      1

This works without purrr::map_dfc, with

library(dplyr)
df %>%
  bind_cols(lapply(events, function(ev) +(rowSums(df[,-1] == ev) > 0)))

# or even juset
cbind(df, lapply(events, function(ev) +(rowSums(df[,-1] == ev) > 0)))

The use of df[,-1] is based on the premise that you're working on all columns except the first. It can also be replaced with some tidyverse verb (select(df, starts_with("con"))) for the same effect.

The underlying mechanism of this answer is the use of rowSums and ==. The df == ev returns a matrix of logicals. Now with a matrix of true/false, we can look for the rowwise sum, where false=0 and true=1. With that, any sum above 0 means at least one column is true.

The +(...) is a quick hack to convert logicals to integers.

Using ifelse conditional on multiple columns

Answers (2)

Related Questions