Reputation: 3815
How can I mutate columns of a dataframe based on a character array of expressions? E.g.,
I have:
library(tidyverse)
dat <- data_frame(id = 0:4,
brand = c(NA, 'coke', 'pepsi', 'other', 'pepsi'),
price = as.character(c(NA, 1, 1.10, 1.25, .99)))
model_feature_definitions_tmp <-
data_frame(feature_id = 0:3,
feature_name = c("intercept", "brand_coke", "brand_pepsi", "price"),
feature_definition = c("as.numeric(id != 0)", "as.numeric(brand == 'coke')",
"as.numeric(brand == 'pepsi')", "as.numeric(price)"))
I want:
# # A tibble: 5 x 4
# intercept brand_coke brand_pepsi price
# <dbl> <dbl> <dbl> <dbl>
# 1 0 NA NA NA
# 2 1 1 0 1.00
# 3 1 0 1 1.10
# 4 1 0 0 1.25
# 5 1 0 1 0.99
The following works:
library(tidyverse)
res_list <- list()
n <- nrow(model_feature_definitions_tmp)
for (i in 1:n) {
mfd_i <- slice(model_feature_definitions_tmp, i)
dat %>%
transmute(eval(parse(text=mfd_i$feature_definition))) ->
res_list[[i]]
}
res_list %>%
bind_cols() %>%
setNames(model_feature_definitions_tmp$feature_name) ->
model_feature_space
But I doubt this is the best approach. I imagine there's a better approach that doesn't involve for-loops or *apply
functions. Maybe the purrr
package could be used here?
tidyverse
solutions are ideal, but not necessary.
Upvotes: 1
Views: 75
Reputation: 3473
Unquote splicing (rlang's !!!
) works well for this task.
library(tidyverse)
dat <-
data_frame(
id = 0:4,
brand = c(NA, 'coke', 'pepsi', 'other', 'pepsi'),
price = as.character(c(NA, 1, 1.10, 1.25, .99))
)
defs <-
data_frame(
feature_name = c("intercept", "brand_coke", "brand_pepsi", "price"),
feature_definition =
c("as.numeric(id != 0)", "as.numeric(brand == 'coke')",
"as.numeric(brand == 'pepsi')", "as.numeric(price)")
)
Essentially you're trying to do the following (I think?):
dat %>%
transmute(
intercept = as.numeric(id != 0),
brand_coke = as.numeric(brand == 'coke'),
brand_pepsi = as.numeric(brand == 'pepsi'),
price = as.numeric(price)
)
Which is equivalent to capturing the quoted expressions first and then splicing them into the ...
of dplyr::transmute
:
quosures1 <-
quos(
intercept = as.numeric(id != 0),
brand_coke = as.numeric(brand == 'coke'),
brand_pepsi = as.numeric(brand == 'pepsi'),
price = as.numeric(price)
)
transmute(dat, !!! quosures1)
But, you have your expressions stored as strings, so they must be parsed into expressions that can then be quoted. Here I map over the strings to generate a list of expressions that I splice into quos
to make a list of quosures. I name the elements of the list so that they are used as LHS names in transmute
:
quosures2 <-
quos(!!! map(defs$feature_definition, rlang::parse_expr)) %>%
set_names(defs$feature_name)
transmute(dat, !!! quosures2)
Of course, I think the first version (without the quoting and splicing) will be easier for future you to read, but if you want to reduce code duplication I could see the argument for the second example (quosures1
). I tend to avoid storing expressions as strings for this reason.
Upvotes: 1