dplyr - mutate formula based on similarities in column names

Question

I am trying to find a better way to run a mutate() on a combination of columns based on parts of the column names.

For example, a way to simplify the mutate function in the following code:

df <- data.frame(LIMITED_A = c(100,200),
                UNLIMITED_A = c(25000,50000),
                LIMITED_B = c(300,300),
                UNLIMITED_B = c(500,500),
                LIMITED_C = c(2,10),
                UNLIMITED_C = c(5,20))

df %>%
  mutate(FINAL_LIMITED = (LIMITED_A - LIMITED_B) / LIMITED_C,
         FINAL_UNLIMITED = (UNLIMITED_A - UNLIMITED_B) / UNLIMITED_C)

A formula with the form: (._A - ._B) / ._C and the result is given the name FINAL_.

Is there a way to simplify this to a single line of code in the mutate function?

acylam · Accepted Answer

Here is a different approach:

library(dplyr)
library(rlang)
library(glue)

dynamic_mutate = function(DF,  
                          col_names = gsub("(.*)_\w+$", "\1", names(DF)), 
                          expression = "({x}_A - {x}_B)/{x}_C",
                          prefix = "FINAL"){

  name_list = col_names %>% 
    unique() %>%
    as.list()

  expr_list = name_list %>%
    lapply(function(x) parse_quosure(glue(expression))) %>% 
    setNames(paste(prefix, name_list, sep = "_")) 

  DF %>% mutate(!!!expr_list)

}

Result:

> df %>%
+   dynamic_mutate()
  LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_LIMITED
1       100       25000       300         500         2           5          -100
2       200       50000       300         500        10          20           -10
  FINAL_UNLIMITED
1            4900
2            2475

> df %>%
+   dynamic_mutate(c("LIMITED", "UNLIMITED"), prefix = "NEW")
  LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C NEW_LIMITED
1       100       25000       300         500         2           5        -100
2       200       50000       300         500        10          20         -10
  NEW_UNLIMITED
1          4900
2          2475

> df %>%
+   dynamic_mutate(c("UNLIMITED"), prefix = "NEW")
  LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C NEW_UNLIMITED
1       100       25000       300         500         2           5          4900
2       200       50000       300         500        10          20          2475

> df %>% 
+   dynamic_mutate(c("A", "B", "C"), "LIMITED_{x} + UNLIMITED_{x}")
  LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_A FINAL_B FINAL_C
1       100       25000       300         500         2           5   25100     800       7
2       200       50000       300         500        10          20   50200     800      30

Notes:

This approach uses lapply and glue to construct expressions from prefixes extracted using gsub (or you can supply your own prefixes/suffixes). parse_quosure from rlang is then used to parse the expression into a quosure. As a result, expr_list is a named list of quosure's which I can then use !!! to unquote and splice the arguments into separate expressions in mutate.

You can change the formula by adjusting the expression argument as shown in the last example.

The advantage of this method is that it is quite fast because I am mainly manipulating column names and creating strings (expressions). The disadvantage is that it uses multiple packages.

dplyr - mutate formula based on similarities in column names

Answers (2)

Related Questions