Reputation: 27
I want to generate variables to check whether a specific event occurred over multiple conditions. A sample dataframe is below.
df <- data.frame(
index = c(1:20),
con1 = c(1,3,2,4,2,7,5,9,1,2,5,6,1,0,8,0,4,5,7,3),
con2 = c(3,5,1,6,3,4,7,3,2,1,5,7,9,1,4,2,4,3,4,3),
con3 = c(2,7,3,4,1,9,4,0,7,0,5,2,7,5,9,3,5,2,1,2))
The actual dataset has 20 conditions[con*] and 10 different event types (each number in the [con*].
What I am doing now is using a tedious command like this;
df %>% mutate (Event1 = ifelse (con1==1 | con2==1 | con3==1,1,0))
df %>% mutate (Event2 = ifelse (con1==2 | con2==2 | con3==2,1,0))
...
It gives exactly what I want to get. However, you can imagine how much mess this makes in the script, with 20 conditions and 10 different events. Do you have any idea how can I make it neat?
Upvotes: 1
Views: 553
Reputation: 160407
Hack,
library(dplyr)
library(purrr) # map_dfc
events <- setNames(1:4, paste0("Event", 1:4))
df %>%
bind_cols(map_dfc(events, ~ +(rowSums(df[,-1] == .) > 0))) %>%
head()
# index con1 con2 con3 Event1 Event2 Event3 Event4
# 1 1 1 3 2 1 1 1 0
# 2 2 3 5 7 0 0 1 0
# 3 3 2 1 3 1 1 1 0
# 4 4 4 6 4 0 0 0 1
# 5 5 2 3 1 1 1 1 0
# 6 6 7 4 9 0 0 0 1
This works without purrr::map_dfc
, with
library(dplyr)
df %>%
bind_cols(lapply(events, function(ev) +(rowSums(df[,-1] == ev) > 0)))
# or even juset
cbind(df, lapply(events, function(ev) +(rowSums(df[,-1] == ev) > 0)))
The use of df[,-1]
is based on the premise that you're working on all columns except the first. It can also be replaced with some tidyverse verb (select(df, starts_with("con"))
) for the same effect.
The underlying mechanism of this answer is the use of rowSums
and ==
. The df == ev
returns a matrix of logicals. Now with a matrix of true/false, we can look for the rowwise sum, where false=0 and true=1. With that, any sum above 0 means at least one column is true.
The +(...)
is a quick hack to convert logicals to integers.
Upvotes: 3
Reputation: 11255
Here is a base way to do it:
df <- data.frame(
index = c(1:20),
con1 = c(1,3,2,4,2,7,5,9,1,2,5,6,1,0,8,0,4,5,7,3),
con2 = c(3,5,1,6,3,4,7,3,2,1,5,7,9,1,4,2,4,3,4,3),
con3 = c(2,7,3,4,1,9,4,0,7,0,5,2,7,5,9,3,5,2,1,2))
total_events = 9
res = matrix(0L, nrow = nrow(df), ncol = total_events, dimnames = list(NULL, paste0('Event', seq_len(total_events))))
res[cbind(df$index, unlist(df[, -1L]))] = 1L
cbind(df, res)
#> index con1 con2 con3 Event1 Event2 Event3 Event4 Event5 Event6 Event7 Event8
#> 1 1 1 3 2 1 1 1 0 0 0 0 0
#> 2 2 3 5 7 0 0 1 0 1 0 1 0
#> 3 3 2 1 3 1 1 1 0 0 0 0 0
#> 4 4 4 6 4 0 0 0 1 0 1 0 0
#> 5 5 2 3 1 1 1 1 0 0 0 0 0
#> 6 6 7 4 9 0 0 0 1 0 0 1 0
#> 7 7 5 7 4 0 0 0 1 1 0 1 0
#> 8 8 9 3 0 0 0 1 0 0 0 0 0
#> 9 9 1 2 7 1 1 0 0 0 0 1 0
#> 10 10 2 1 0 1 1 0 0 0 0 0 0
#> 11 11 5 5 5 0 0 0 0 1 0 0 0
#> 12 12 6 7 2 0 1 0 0 0 1 1 0
#> 13 13 1 9 7 1 0 0 0 0 0 1 0
#> 14 14 0 1 5 1 0 0 0 1 0 0 0
#> 15 15 8 4 9 0 0 0 1 0 0 0 1
#> 16 16 0 2 3 0 1 1 0 0 0 0 0
#> 17 17 4 4 5 0 0 0 1 1 0 0 0
#> 18 18 5 3 2 0 1 1 0 1 0 0 0
#> 19 19 7 4 1 1 0 0 1 0 0 1 0
#> 20 20 3 3 2 0 1 1 0 0 0 0 0
#> Event9
## truncated....
The idea is that we use the index and conditions as indices to subset and assign 1 to those value combinations.
Upvotes: 1