BJ_Kim
BJ_Kim

Reputation: 27

Using ifelse conditional on multiple columns

I want to generate variables to check whether a specific event occurred over multiple conditions. A sample dataframe is below.

df <- data.frame(
    index = c(1:20),
    con1 = c(1,3,2,4,2,7,5,9,1,2,5,6,1,0,8,0,4,5,7,3),
    con2 = c(3,5,1,6,3,4,7,3,2,1,5,7,9,1,4,2,4,3,4,3),
    con3 = c(2,7,3,4,1,9,4,0,7,0,5,2,7,5,9,3,5,2,1,2))

The actual dataset has 20 conditions[con*] and 10 different event types (each number in the [con*].

What I am doing now is using a tedious command like this;

df %>% mutate (Event1 = ifelse (con1==1 | con2==1 | con3==1,1,0))
df %>% mutate (Event2 = ifelse (con1==2 | con2==2 | con3==2,1,0))
...

It gives exactly what I want to get. However, you can imagine how much mess this makes in the script, with 20 conditions and 10 different events. Do you have any idea how can I make it neat?

Upvotes: 1

Views: 553

Answers (2)

r2evans
r2evans

Reputation: 160407

Hack,

library(dplyr)
library(purrr) # map_dfc
events <- setNames(1:4, paste0("Event", 1:4))
df %>%
  bind_cols(map_dfc(events, ~ +(rowSums(df[,-1] == .) > 0))) %>%
  head()
#   index con1 con2 con3 Event1 Event2 Event3 Event4
# 1     1    1    3    2      1      1      1      0
# 2     2    3    5    7      0      0      1      0
# 3     3    2    1    3      1      1      1      0
# 4     4    4    6    4      0      0      0      1
# 5     5    2    3    1      1      1      1      0
# 6     6    7    4    9      0      0      0      1

This works without purrr::map_dfc, with

library(dplyr)
df %>%
  bind_cols(lapply(events, function(ev) +(rowSums(df[,-1] == ev) > 0)))

# or even juset
cbind(df, lapply(events, function(ev) +(rowSums(df[,-1] == ev) > 0)))

The use of df[,-1] is based on the premise that you're working on all columns except the first. It can also be replaced with some tidyverse verb (select(df, starts_with("con"))) for the same effect.

The underlying mechanism of this answer is the use of rowSums and ==. The df == ev returns a matrix of logicals. Now with a matrix of true/false, we can look for the rowwise sum, where false=0 and true=1. With that, any sum above 0 means at least one column is true.

The +(...) is a quick hack to convert logicals to integers.

Upvotes: 3

Cole
Cole

Reputation: 11255

Here is a base way to do it:

df <- data.frame(
  index = c(1:20),
  con1 = c(1,3,2,4,2,7,5,9,1,2,5,6,1,0,8,0,4,5,7,3),
  con2 = c(3,5,1,6,3,4,7,3,2,1,5,7,9,1,4,2,4,3,4,3),
  con3 = c(2,7,3,4,1,9,4,0,7,0,5,2,7,5,9,3,5,2,1,2))

total_events = 9

res = matrix(0L, nrow = nrow(df), ncol = total_events, dimnames = list(NULL, paste0('Event', seq_len(total_events))))
res[cbind(df$index, unlist(df[, -1L]))] = 1L

cbind(df, res)
#>    index con1 con2 con3 Event1 Event2 Event3 Event4 Event5 Event6 Event7 Event8
#> 1      1    1    3    2      1      1      1      0      0      0      0      0
#> 2      2    3    5    7      0      0      1      0      1      0      1      0
#> 3      3    2    1    3      1      1      1      0      0      0      0      0
#> 4      4    4    6    4      0      0      0      1      0      1      0      0
#> 5      5    2    3    1      1      1      1      0      0      0      0      0
#> 6      6    7    4    9      0      0      0      1      0      0      1      0
#> 7      7    5    7    4      0      0      0      1      1      0      1      0
#> 8      8    9    3    0      0      0      1      0      0      0      0      0
#> 9      9    1    2    7      1      1      0      0      0      0      1      0
#> 10    10    2    1    0      1      1      0      0      0      0      0      0
#> 11    11    5    5    5      0      0      0      0      1      0      0      0
#> 12    12    6    7    2      0      1      0      0      0      1      1      0
#> 13    13    1    9    7      1      0      0      0      0      0      1      0
#> 14    14    0    1    5      1      0      0      0      1      0      0      0
#> 15    15    8    4    9      0      0      0      1      0      0      0      1
#> 16    16    0    2    3      0      1      1      0      0      0      0      0
#> 17    17    4    4    5      0      0      0      1      1      0      0      0
#> 18    18    5    3    2      0      1      1      0      1      0      0      0
#> 19    19    7    4    1      1      0      0      1      0      0      1      0
#> 20    20    3    3    2      0      1      1      0      0      0      0      0
#>    Event9
## truncated....

The idea is that we use the index and conditions as indices to subset and assign 1 to those value combinations.

Upvotes: 1

Related Questions