NickCHK
NickCHK

Reputation: 1233

Passing the attached data frame to a function

I am working on a function that combines information about a particular variable with some basic information about the data frame it comes from. Here is an example of what I'm talking about:

fcn <- function(var,data) {
  return(ncol(data)*mean(var))
}

df <- data.frame(a=1:10,b=1:10)

df %>% dplyr::mutate(c=fcn(a,df))

This works fine! However, it would be really neat if, in cases where the function is used with with or inside a dplyr verb, I can just nab the data frame/tibble object without it being explicitly passed. So ideally something like

fcn <- function(var,data=attached_data_object) {
  return(ncol(data)*mean(var))
}

df <- data.frame(a=1:10,b=1:10)

df %>% dplyr::mutate(c=fcn(a))

I've been reading up on the various environment functions - seems like I should be able to reach into the environment that with/dplyr creates from the data frame and pluck the whole thing out wholesale. As of yet I have been unable to figure out how to make this happen. Any tips appreciated! Thank you.

Upvotes: 1

Views: 69

Answers (3)

vorpal
vorpal

Reputation: 318

doctorG's answer works with the magrittr pipe, but not the native pipe. Of course, to be fair to doctorG, the native pipe didn't yet exist when s/he wrote their answer.

A more robust (and elegant) solution is to use dplyr::cur_data_all():

fcn <- function(var) {
  return(ncol(cur_data_all())*mean(var))
}

df <- data.frame(a=1:10,b=1:10)

df |> 
  dplyr::mutate(c=fcn(a))

Upvotes: 0

doctorG
doctorG

Reputation: 1731

(With apologies to Hadley if I get terms slightly wrong). You might find the chapters on Environments and NSE (non-standard evaluation) from Advanced R useful.

Within dplyr verbs, such as mutate, the dataframe/tibble being manipulated is called ".". Hence the "." in another answer here to refer to the dataframe. The dplyr verbs automatically look in "." for the specified column name. When you call a function from within mutate(), as you are doing here, you are wanting to access this object called "." that lives in the execution environment of your function. So how do we do that?

fcn <- function(var) {
  dat <- get(".", env=parent.frame())
  return(ncol(dat) * mean(var))
}

notacol <- 8
df <- data.frame(a=1:10, b=seq(10, 100, 10))
df
    a   b
1   1  10
2   2  20
3   3  30
4   4  40
5   5  50
6   6  60
7   7  70
8   8  80
9   9  90
10 10 100


df %>% mutate(c = fcn(a), d = fcn(b), e = fcn(notacol))
    a   b  c   d  e
1   1  10 11 110 16
2   2  20 11 110 16
3   3  30 11 110 16
4   4  40 11 110 16
5   5  50 11 110 16
6   6  60 11 110 16
7   7  70 11 110 16
8   8  80 11 110 16
9   9  90 11 110 16
10 10 100 11 110 16

I think this is the behaviour you were after. Note that notacol isn't found in the execution environment as it isn't in the dataframe, but the Global Env is on the search path so it's found there.

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76402

I am not sure that the following is what you want.
Anyway, you must have the dataset as your first function argument.

library(dplyr)

fcn <- function(data, var) {
  var <- deparse(substitute(var))
  ncol(data)*mean(data[[var]])
}

df <- data.frame(a = 1:10, b = 11:20)

df %>% fcn(a)
#[1] 11

df %>% mutate(c = fcn(., a))
#    a  b  c
#1   1 11 11
#2   2 12 11
#3   3 13 11
#4   4 14 11
#5   5 15 11
#6   6 16 11
#7   7 17 11
#8   8 18 11
#9   9 19 11
#10 10 20 11

df %>% summarise(c = fcn(., a))
#   c
#1 11

Upvotes: 0

Related Questions