Reputation: 87
I want to add a new column based on a given character vector.
For example, in the example below, I want to add column d
defined in expr
:
library(magrittr)
data <- tibble::tibble(
a = c(1, 2),
b = c(3, 4)
)
expr <- "d = a + b"
just as below:
data %>%
dplyr::mutate(d = a + b)
# # A tibble: 2 x 3
# a b d
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
However, in the codes below, while the calculations themselves (i.e., adding) work, the names of the new columns are different from what I expected.
data %>%
dplyr::mutate(!!rlang::parse_expr(expr))
# # A tibble: 2 x 3
# a b `d = a + b`
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
data %>%
dplyr::mutate(!!rlang::parse_quo(expr, env = rlang::global_env()))
# # A tibble: 2 x 3
# a b `d = a + b`
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
data %>%
dplyr::mutate(rlang::eval_tidy(rlang::parse_expr(expr)))
# # A tibble: 2 x 3
# a b `rlang::eval_tidy(rlang::parse_expr(expr))`
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
How can I properly use an expression in dplyr::mutate?
My question is similar to this, but in my example, the new variable (d
) and its definition (a + b
) are given in a single character vector (expr
).
Upvotes: 5
Views: 1532
Reputation: 269481
Any of these work. The second is similar to the first but does not require that rlang
be on the search path. The third and fourth also work if the d=
part is not present in expr
in which case default names are used. The last one uses only base R and is also the shortest.
data %>% mutate(within(., !!parse_expr(expr)))
data %>% mutate(within(., !!parse(text = expr)))
data %>% mutate(data, !!parse_expr(sprintf("tibble(%s)", expr)))
data %>% { eval_tidy(parse_expr(sprintf("mutate(., %s)", expr))) }
within(data, eval(parse(text = expr))) # base R
Assume this premable:
library(dplyr)
library(rlang)
# input
data <- tibble(a = c(1, 2), b = c(3, 4))
expr <- "d = a + b"
Upvotes: 2
Reputation: 598
To get the desired name for the mutated column, you can still use the same syntax and assign the results to a column with the preferred name. To get this name you can use a regular expression to find what is before =
and then remove any leading or trailing spaces that might exist.
expr <- "x = a * b"
col_name <- trimws(str_extract(expr,"[^=]+"))
data %>%
dplyr::mutate(!!col_name := !!rlang::parse_expr(expr))
# A tibble: 2 × 3
a b x
<dbl> <dbl> <dbl>
1 1 3 3
2 2 4 8
data %>%
dplyr::mutate(!!col_name := !!rlang::parse_quo(expr, env = rlang::global_env()))
# A tibble: 2 × 3
a b x
<dbl> <dbl> <dbl>
1 1 3 3
2 2 4 8
data %>%
dplyr::mutate(!!col_name := rlang::eval_tidy(rlang::parse_expr(expr)))
# A tibble: 2 × 3
a b x
<dbl> <dbl> <dbl>
1 1 3 3
2 2 4 8
Upvotes: 1
Reputation: 18551
Lets first look at what kind of expressions dplyr::mutate
takes to create named variables: we need a named list that contains an expression to create variables based on that expression with the given list element name.
library(tidyverse)
data <- tibble::tibble(
a = c(1, 2),
b = c(3, 4)
)
expr <- "d = a + b"
# let's rewrite the string above as named list containing an expression.
expr2 <- list(d = expr(a + b))
# this works as expected:
data %>%
mutate(!!! expr2)
#> # A tibble: 2 x 3
#> a b d
#> <dbl> <dbl> <dbl>
#> 1 1 3 4
#> 2 2 4 6
Now we simply need a function that transforms a string into a named list containing the expression of the right-hand side of the equation. The name needs to be the left-hand side of the equation. We can do this with regular string manipulations. Finally we need to transform the right-hand side of the equation from a string into an expression. We can use base R's str2lang
here.
create_expr_ls <- function(str_expr) {
expr_nm <- str_extract(str_expr, "^\\w+")
expr_code <- str_replace_all(str_expr, "(^\\w+\\s?=\\s?)(.*)", "\\2")
set_names(list(str2lang(expr_code)), expr_nm)
}
expr3 <- create_expr_ls(expr)
data %>%
mutate(!!! expr3)
#> # A tibble: 2 x 3
#> a b d
#> <dbl> <dbl> <dbl>
#> 1 1 3 4
#> 2 2 4 6
Created on 2022-01-23 by the reprex package (v0.3.0)
Upvotes: 6