Reputation: 331
I am trying to find a better way to run a mutate()
on a combination of columns based on parts of the column names.
For example, a way to simplify the mutate
function in the following code:
df <- data.frame(LIMITED_A = c(100,200),
UNLIMITED_A = c(25000,50000),
LIMITED_B = c(300,300),
UNLIMITED_B = c(500,500),
LIMITED_C = c(2,10),
UNLIMITED_C = c(5,20))
df %>%
mutate(FINAL_LIMITED = (LIMITED_A - LIMITED_B) / LIMITED_C,
FINAL_UNLIMITED = (UNLIMITED_A - UNLIMITED_B) / UNLIMITED_C)
A formula with the form: (._A - ._B) / ._C
and the result is given the name FINAL_.
Is there a way to simplify this to a single line of code in the mutate
function?
Upvotes: 2
Views: 589
Reputation: 18681
Here is a different approach:
library(dplyr)
library(rlang)
library(glue)
dynamic_mutate = function(DF,
col_names = gsub("(.*)_\\w+$", "\\1", names(DF)),
expression = "({x}_A - {x}_B)/{x}_C",
prefix = "FINAL"){
name_list = col_names %>%
unique() %>%
as.list()
expr_list = name_list %>%
lapply(function(x) parse_quosure(glue(expression))) %>%
setNames(paste(prefix, name_list, sep = "_"))
DF %>% mutate(!!!expr_list)
}
Result:
> df %>%
+ dynamic_mutate()
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_LIMITED
1 100 25000 300 500 2 5 -100
2 200 50000 300 500 10 20 -10
FINAL_UNLIMITED
1 4900
2 2475
> df %>%
+ dynamic_mutate(c("LIMITED", "UNLIMITED"), prefix = "NEW")
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C NEW_LIMITED
1 100 25000 300 500 2 5 -100
2 200 50000 300 500 10 20 -10
NEW_UNLIMITED
1 4900
2 2475
> df %>%
+ dynamic_mutate(c("UNLIMITED"), prefix = "NEW")
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C NEW_UNLIMITED
1 100 25000 300 500 2 5 4900
2 200 50000 300 500 10 20 2475
> df %>%
+ dynamic_mutate(c("A", "B", "C"), "LIMITED_{x} + UNLIMITED_{x}")
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_A FINAL_B FINAL_C
1 100 25000 300 500 2 5 25100 800 7
2 200 50000 300 500 10 20 50200 800 30
Notes:
This approach uses lapply
and glue
to construct expressions from prefixes extracted using gsub
(or you can supply your own prefixes/suffixes). parse_quosure
from rlang
is then used to parse the expression into a quosure
. As a result, expr_list
is a named list of quosure
's which I can then use !!!
to unquote and splice the arguments into separate expressions in mutate
.
You can change the formula by adjusting the expression
argument as shown in the last example.
The advantage of this method is that it is quite fast because I am mainly manipulating column names and creating strings (expressions). The disadvantage is that it uses multiple packages.
Upvotes: 2
Reputation: 39154
One idea is to convert the data frame to long format and conduct the calculation.
library(dplyr)
library(tidyr)
df2 <- df %>%
mutate(ID = 1:n()) %>%
gather(Type, Value, -ID) %>%
separate(Type, into = c("Type", "Group")) %>%
spread(Group, Value) %>%
mutate(Final = (A - B)/C)
df2
ID Type A B C Final
1 1 LIMITED 100 300 2 -100
2 1 UNLIMITED 25000 500 5 4900
3 2 LIMITED 200 300 10 -10
4 2 UNLIMITED 50000 500 20 2475
And you can always convert the data frame back to the wide format.
df3 <- df2 %>%
gather(Group, Value, A:Final) %>%
unite(Col, Type, Group) %>%
spread(Col, Value) %>%
select(colnames(df),
FINAL_LIMITED = LIMITED_Final,
FINAL_UNLIMITED = UNLIMITED_Final)
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_LIMITED FINAL_UNLIMITED
1 100 25000 300 500 2 5 -100 4900
2 200 50000 300 500 10 20 -10 2475
Upvotes: 2