Tom.L
Tom.L

Reputation: 33

Mutate over every possible combination of columns

I have a data frame of binary variables:

df <-data.frame(a = c(0,1,0,1,0), b = c(1, 1, 0, 0, 1), c = c(1,0,1,1,0))

And I'd like to create a column for each possible combination of my pre-existing columns:

library(tidyverse)
df %>% 
mutate(d = case_when(a==1 & b==1 & c==1 ~ 1),
             e = case_when(a==1 & b==1 & c!=1 ~ 1),
             f = case_when(a==1 & b!=1 & c==1 ~ 1),
             g = case_when(a!=1 & b==1 & c==1 ~ 1))

But my real dataset has too many columns to do this without a function or loop. Is there an easy way to do this in R?

Upvotes: 3

Views: 818

Answers (2)

Phil
Phil

Reputation: 8117

An alternative to David's answer, but I recognize it's a little awkward:

df %>% 
 unite(comb, a:c, remove = FALSE) %>% 
 spread(key = comb, value = comb) %>% 
 mutate_if(is.character, funs(if_else(is.na(.), 0, 1)))

#>   a b c 0_0_1 0_1_0 0_1_1 1_0_1 1_1_0
#> 1 0 0 1     1     0     0     0     0
#> 2 0 1 0     0     1     0     0     0
#> 3 0 1 1     0     0     1     0     0
#> 4 1 0 1     0     0     0     1     0
#> 5 1 1 0     0     0     0     0     1

EDIT: funs() is being deprecated as of version 0.8.0 of dplyr, so the last line should be revised to:

mutate_if(is.character, list(~ if_else(is.na(.), 0, 1)))

Upvotes: 2

David Robinson
David Robinson

Reputation: 78620

First note that do.call(paste0, df) will combine all of your columns into one string, however many they are:

do.call(paste0, df)
# [1] "011" "110" "001" "101" "010" "011"

Then you can use spread() from the tidyr package to give each its own column. Note that you have to add an extra row column so that it knows to keep each of the rows separate (instead of trying to combine them).

# I added a sixth row that copied the first to make the effect clear
df<-data.frame(a = c(0,1,0,1,0,0), b = c(1, 1, 0, 0, 1, 1), c = c(1,0,1,1,0,1))

# this assumes you want `type_` at the start of each new column,
# but you could use a different convention
df %>%
  mutate(type = paste0("type_", do.call(paste0, df)),
         value = 1,
         row = row_number()) %>%
  spread(type, value, fill = 0) %>%
  select(-row)

Result:

  a b c type_001 type_010 type_011 type_101 type_110
1 0 0 1        1        0        0        0        0
2 0 1 0        0        1        0        0        0
3 0 1 1        0        0        1        0        0
4 0 1 1        0        0        1        0        0
5 1 0 1        0        0        0        1        0
6 1 1 0        0        0        0        0        1

Upvotes: 2

Related Questions