Ujjawal Bhandari
Ujjawal Bhandari

Reputation: 1372

Evaluate Multiple Lines in Dplyr

I have dataset which shows Variables, calculation I want to perform (sum, no. of distinct values) and new variable names after the calculation.

library(dplyr)

RefDf <- read.table(text = "Variables   Calculation NewVariable
Sepal.Length    sum Sepal.Length2
Petal.Length    n_distinct  Petal.LengthNew
", header = T)

Manual Approach - Summarise by grouping of Species variable.

iris %>% group_by_at("Species") %>% 
  summarise(Sepal.Length2 = sum(Sepal.Length,na.rm = T),
            Petal.LengthNew = n_distinct(Petal.Length, na.rm = T)
            )

Automate via eval(parse( ))

x <- RefDf %>% mutate(Check = paste0(NewVariable, " = ", Calculation, "(", Variables, ", na.rm = T", ")")) %>% pull(Check)
iris %>% group_by_at("Species") %>% summarise(eval(parse(text = x)))

As of now it is returning -

  Species    `eval(parse(text = x))`
  <fct>                        <int>
1 setosa                           9
2 versicolor                      19
3 virginica                       20

It should return -

  Species    Sepal.Length2 Petal.LengthNew
  <fct>              <dbl>           <int>
1 setosa              250.               9
2 versicolor          297.              19
3 virginica           329.              20

Upvotes: 3

Views: 302

Answers (2)

Anoushiravan R
Anoushiravan R

Reputation: 21908

Updated I found a way of sparing those extra lines.

This is just another way of getting your desired result. I'd rather create a function call for every row of your data set and then iterate over it beside the new column names to get to the desired output:

library(dplyr)
library(rlang)
library(purrr)

# First we create a new variable which is actually of type call in your data set
RefDf %>%
  rowwise() %>%
  mutate(Call = list(call2(Calculation, parse_expr(Variables)))) -> Rf

Rf
# A tibble: 2 x 4
# Rowwise: 
  Variables    Calculation NewVariable     Call      
  <chr>        <chr>       <chr>           <list>    
1 Sepal.Length sum         Sepal.Length2   <language>
2 Petal.Length n_distinct  Petal.LengthNew <language>


# Then we iterate over `NewVariable` and `Call` at the same time to set the new variable 
# name and also evaluate the `call` at the same time

map2(Rf$NewVariable, Rf$Call, ~ iris %>% group_by(Species) %>%
         summarise(!!.x := eval_tidy(.y))) %>%
  reduce(~ left_join(.x, .y, by = "Species"))


# A tibble: 3 x 3
  Species    Sepal.Length2 Petal.LengthNew
  <fct>              <dbl>           <int>
1 setosa              250.               9
2 versicolor          297.              19
3 virginica           329.              20

Upvotes: 3

r.user.05apr
r.user.05apr

Reputation: 5456

You can use parse_exprs:

library(tidyverse)
library(rlang)

RefDf <- read.table(text = "Variables   Calculation NewVariable
Sepal.Length    sum Sepal.Length2
Petal.Length    n_distinct  Petal.LengthNew
", header = T)

#
expr_txt <- set_names(str_c(RefDf$Calculation, "(", RefDf$Variables, ")"), 
                      RefDf$NewVariable)

iris %>%
     group_by_at("Species") %>%
     summarise(!!!parse_exprs(expr_txt), .groups = "drop")

## A tibble: 3 x 3
#Species    Sepal.Length2 Petal.LengthNew
#<fct>              <dbl>           <int>
#1 setosa              250.               9
#2 versicolor          297.              19
#3 virginica           329.              20

Upvotes: 3

Related Questions