user18894435
user18894435

Reputation: 521

Execute a list of purrr-style lambda formulas on a data frame

The following toy data has 5 variables, X1 to X5.

set.seed(123)
df <- data.frame(matrix(rnorm(500), 100, 5))

I want to perform specific operations on specific variables, using a named list of purrr-style lambda formulas

fun_list <- list(
  X2 = ~ quantile(.x, c(0.1, 0.9), na.rm = TRUE),
  X4 = ~ fivenum(.x, na.rm = TRUE)
)

How can I apply fun_list to my df according to its variable names?

I know rlang::as_function() can convert a purrr-style formula into a R function. But I guess there is some function that is able to deal with purrr-style formulas intrinsically. Its usage might be

execute(fun_list, environment = df)

The expected output is

$X2
      10%       90% 
-1.289408  1.058432 

$X4
[1] -2.465898194 -0.737146704 -0.003508661  0.693634712  2.571458146

Upvotes: 2

Views: 87

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269291

1) Here is a base R solution. First we create a function, fo2fun, which accepts a formula and outputs the corresponding function. Then execute is a function with a one-statement body using Map to apply it to each formula and list name/index.

fo2fun <- function(formula) {
    f <- function(.x) {}
    body(f) <- formula[[2]]
    environment(f)  <- environment(formula)
    f
}

execute <- function(funs, envir = parent.frame()) {
  Map(\(fo, index) fo2fun(fo)(envir[[index]]), funs, names(fun_list))
}

# test
expected <- list(
  X2 = quantile(df$X2, c(0.1, 0.9), na.rm = TRUE), 
  X4 = fivenum(df$X4, na.rm = TRUE)
)

execute(fun_list, df) |> identical(expected)
## [1] TRUE

execute(fun_list, list2env(df)) |> identical(expected)
## [1] TRUE

list2env(df, .GlobalEnv)
execute(fun_list) |> identical(expected)
## [1] TRUE

2) This is the same as (1) except we have used match.funfn from the gsubfn package in place of fo2fun.

With this approach the formal argument is not restricted to be .x but rather match.funfn assumes that any free variable found in the formula is the argument. Optionally specify the argument variable on the left hand side of the formula. This latter syntax should be used if there are non-argument free variables in the formula to distinguish the argument but can also be used even if not.

library(gsubfn)

fun_list2 <- list(
  X2 = ~ quantile(var, c(0.1, 0.9), na.rm = TRUE),
  X4 = x ~ fivenum(x, na.rm = TRUE)
)

execute <- function(funs, envir = parent.frame()) {
  Map(\(fo, index) match.funfn(fo)(envir[[index]]), funs, names(fun_list))
}

# test

execute(fun_list, df) |> identical(expected)
## [1] TRUE

execute(fun_list2, df) |> identical(expected)
## [1] TRUE

Note

Input from question:

set.seed(123)
df <- data.frame(matrix(rnorm(500), 100, 5))

fun_list <- list(
  X2 = ~ quantile(.x, c(0.1, 0.9), na.rm = TRUE),
  X4 = ~ fivenum(.x, na.rm = TRUE)
)

Upvotes: 3

Darren Tsai
Darren Tsai

Reputation: 35554

A workaround is to use a nested map, which can take a purrr-style formula as input and avoid the use of rlang::as_function().

library(purrr)

imap(fun_list, \(f, var) map(df[var], f)[[1]])

# $X2
#       10%       90% 
# -1.289408  1.058432 
# 
# $X4
# [1] -2.465898194 -0.737146704 -0.003508661  0.693634712  2.571458146

or briefly, imap(fun_list, ~ map(df[.y], .x)[[1]]).

Upvotes: 2

Related Questions