Koopa
Koopa

Reputation: 87

How to use an expression in dplyr::mutate in R

I want to add a new column based on a given character vector. For example, in the example below, I want to add column d defined in expr:

library(magrittr)

data <- tibble::tibble(
  a = c(1, 2),
  b = c(3, 4)
)

expr <- "d = a + b"

just as below:

data %>%
  dplyr::mutate(d = a + b)

# # A tibble: 2 x 3
#       a     b     d
#   <dbl> <dbl> <dbl>
# 1     1     3     4
# 2     2     4     6

However, in the codes below, while the calculations themselves (i.e., adding) work, the names of the new columns are different from what I expected.

data %>%
  dplyr::mutate(!!rlang::parse_expr(expr))

# # A tibble: 2 x 3
#       a     b `d = a + b`
#   <dbl> <dbl>       <dbl>
# 1     1     3           4
# 2     2     4           6

data %>%
  dplyr::mutate(!!rlang::parse_quo(expr, env = rlang::global_env()))

# # A tibble: 2 x 3
#       a     b `d = a + b`
#   <dbl> <dbl>       <dbl>
# 1     1     3           4
# 2     2     4           6

data %>%
  dplyr::mutate(rlang::eval_tidy(rlang::parse_expr(expr)))

# # A tibble: 2 x 3
#       a     b `rlang::eval_tidy(rlang::parse_expr(expr))`
#   <dbl> <dbl>                                       <dbl>
# 1     1     3                                           4
# 2     2     4                                           6

How can I properly use an expression in dplyr::mutate?

My question is similar to this, but in my example, the new variable (d) and its definition (a + b) are given in a single character vector (expr).

Upvotes: 5

Views: 1532

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269481

Any of these work. The second is similar to the first but does not require that rlang be on the search path. The third and fourth also work if the d= part is not present in expr in which case default names are used. The last one uses only base R and is also the shortest.

data %>% mutate(within(., !!parse_expr(expr)))

data %>% mutate(within(., !!parse(text = expr)))

data %>% mutate(data, !!parse_expr(sprintf("tibble(%s)", expr)))

data %>% { eval_tidy(parse_expr(sprintf("mutate(., %s)", expr))) }

within(data, eval(parse(text = expr)))  # base R

Note

Assume this premable:

library(dplyr)
library(rlang)

# input
data <- tibble(a = c(1, 2), b = c(3, 4))
expr <- "d = a + b"

Upvotes: 2

ekolima
ekolima

Reputation: 598

To get the desired name for the mutated column, you can still use the same syntax and assign the results to a column with the preferred name. To get this name you can use a regular expression to find what is before = and then remove any leading or trailing spaces that might exist.

expr <- "x = a * b"
col_name <- trimws(str_extract(expr,"[^=]+"))

data %>%
   dplyr::mutate(!!col_name := !!rlang::parse_expr(expr))
# A tibble: 2 × 3
      a     b     x
  <dbl> <dbl> <dbl>
1     1     3     3
2     2     4     8

data %>%
   dplyr::mutate(!!col_name := !!rlang::parse_quo(expr, env = rlang::global_env()))
# A tibble: 2 × 3
      a     b     x
  <dbl> <dbl> <dbl>
1     1     3     3
2     2     4     8
 
data %>%
   dplyr::mutate(!!col_name := rlang::eval_tidy(rlang::parse_expr(expr)))
# A tibble: 2 × 3
      a     b     x
  <dbl> <dbl> <dbl>
1     1     3     3
2     2     4     8

Upvotes: 1

TimTeaFan
TimTeaFan

Reputation: 18551

Lets first look at what kind of expressions dplyr::mutate takes to create named variables: we need a named list that contains an expression to create variables based on that expression with the given list element name.

library(tidyverse)

data <- tibble::tibble(
  a = c(1, 2),
  b = c(3, 4)
)

expr <- "d = a + b"
# let's rewrite the string above as named list containing an expression.
expr2 <- list(d = expr(a + b))

# this works as expected:
data %>% 
  mutate(!!! expr2)

#> # A tibble: 2 x 3
#>       a     b     d
#>   <dbl> <dbl> <dbl>
#> 1     1     3     4
#> 2     2     4     6

Now we simply need a function that transforms a string into a named list containing the expression of the right-hand side of the equation. The name needs to be the left-hand side of the equation. We can do this with regular string manipulations. Finally we need to transform the right-hand side of the equation from a string into an expression. We can use base R's str2lang here.

create_expr_ls <- function(str_expr) {
  expr_nm <- str_extract(str_expr, "^\\w+")
  expr_code <- str_replace_all(str_expr, "(^\\w+\\s?=\\s?)(.*)", "\\2")
  set_names(list(str2lang(expr_code)), expr_nm)
}

expr3 <- create_expr_ls(expr)

data %>% 
  mutate(!!! expr3)

#> # A tibble: 2 x 3
#>       a     b     d
#>   <dbl> <dbl> <dbl>
#> 1     1     3     4
#> 2     2     4     6

Created on 2022-01-23 by the reprex package (v0.3.0)

Upvotes: 6

Related Questions