How to create multiple new columns based on multiple conditional statements?

Question

[ First Stack question please be kind :) ]

I'm creating multiple new columns in a data frame based on multiple conditional statements of existing columns - all essentially new combinations of columns.

For example, if there are 4 columns (a:d), I need new columns of all combinations (abcd, abc, abd, etc) and a 0/1 coding based on threshold data in a:d.

Toy data example included and desired outcome. However needs to be scalable: there are 4 base columns, but I need all combinations of 2, 3 and 4 columns not just 3-value (abc, abd, .... ab, ac, ad, ... total n = 11)

[Background for context: this is actually flow cytometry data from multipotent stem cells that can grow into colonies of all lineage cell type (multipotent, or abcd) or progressively more restricted populations (only abc, or abd, ab, ac, etc)

# Toy data set
set.seed(123)
df <- tibble(a = c(sample(10:50, 10)),
             b = c(sample(10:50, 10)),
             c = c(sample(10:50, 10)),
             d = c(sample(10:50, 10)))

Current code produces the desired result however, this needs 11 lines of repetitive code which is error prone and I hope has a more elegant solution:

df %>% 
  mutate(
    abcd = if_else(a > 30 & b > 20 & c > 30 & d > 30, 1, 0),
    abc = if_else(a > 30 & b > 20 & c > 30 & d <= 30, 1, 0),
    abd = if_else(a > 30 & b > 20 & c <= 30 & d > 30, 1, 0),
    acd = if_else(a > 30 & b <= 20 & c > 30 & d > 30, 1, 0),
    bcd = if_else(a <= 30 & b > 20 & c > 30 & d > 30, 1, 0))

nsinghphd · Accepted Answer

What I understand from your question, for each row you just need to find which columns meet the criteria defined in your ifelse() conditions. This vectorized solution will add a column to your df which contains all the combinations. This probably is also faster than multiple ifelse conditions as well. Finally, the new column can be used for ordering or grouping.

# define the threshold levels for all columns
threshold = c(a=30, b=20, c=30, d=30)

# get names of columns meeting the threshold and paste names
df$combn <- apply(df, 1, function(x) {
  paste(names(x)[x > threshold], collapse = "")
})

> df
# A tibble: 10 x 5
       a     b     c     d combn
       
 1    21    49    46    49 bcd  
 2    41    28    37    46 abcd 
 3    25    36    34    36 bcd  
 4    43    31    47    40 abcd 
 5    44    13    48    10 ac   
 6    11    42    35    27 bc   
 7    28    18    29    48 d    
 8    40    11    30    17 a    
 9    46    20    19    20 a    
10    24    40    14    43 bd

How to create multiple new columns based on multiple conditional statements?

Answers (2)

Related Questions