Jesse Birchfield
Jesse Birchfield

Reputation: 21

R: subsetting within a function

Suppose I have a data frame in the environment, mydata, with three columns, A, B, C.

mydata = data.frame(A=c(1,2,3),
                    B=c(4,5,6),
                    C=c(7,8,9))

I can create a linear model with

lm(C ~ A, data=mydata)

I want a function to generalize this, to regress B or C on A, given just the name of the column, i.e.,

f = function(x){
  lm(x ~ A, data=mydata)
}
f(B)
f(C)

or

g = function(x){
  lm(mydata$x ~ mydata$A)
}
g(B)
g(C)

These solutions don't work. I know there is something wrong with the evaluation, and I have tried permutations of quo() and enquo() and !!, but no success.

This is a simplified example, but the idea is, when I have dozens of similar models to build, each fairly complicated, with only one variable changing, I want to do so without repeating the entire formula each time.

Upvotes: 0

Views: 752

Answers (3)

akrun
akrun

Reputation: 887881

If we want to pass unquoted column name, and option is {{}} from tidyverse. With select, it can take both string and unquoted

library(dplyr)
printcol2 <- function(data, x) {
                    data %>%
                      select({{x}})
      }

printcol2(mydata, A)
#  A
#1 1
#2 2
#3 3
printcol2(mydata, 'A')
#  A
#1 1
#2 2
#3 3

If the OP wanted to pass unquoted column name to be passed in lm

f1 <- function(x){
    rsp <- deparse(substitute(x))
    fmla <- reformulate("A", response = rsp)
    out <- lm(fmla, data=mydata)
    out$call <- as.symbol(paste0("lm(", deparse(fmla), ", data = mydata)"))
    out
   }

f1(B)

#Call:
#lm(B ~ A, data = mydata)

#Coefficients:
#(Intercept)            A  
#          3            1  

f1(C)

#Call:
#lm(C ~ A, data = mydata)

#Coefficients:
#(Intercept)            A  
#          6            1  

Upvotes: 3

Rui Barradas
Rui Barradas

Reputation: 76661

Maybe you are looking for deparse(substitute(.)). It accepts arguments quoted or not quoted.

f = function(x, data = mydata){
  y <- deparse(substitute(x))
  fmla <- paste(y, 'Species', sep = '~')
  lm(as.formula(fmla), data = data)
}

mydata <- iris
f(Sepal.Length)
#
#Call:
#lm(formula = as.formula(fmla), data = data)
#
#Coefficients:
#      (Intercept)  Speciesversicolor   Speciesvirginica  
#            5.006              0.930              1.582  

f(Petal.Width)
#
#Call:
#lm(formula = as.formula(fmla), data = data)
#
#Coefficients:
#      (Intercept)  Speciesversicolor   Speciesvirginica  
#            0.246              1.080              1.780

Upvotes: 2

Matt
Matt

Reputation: 7413

I think generally, you might be looking for:

printcol <- function(x){
  print(x)
}

printcol(mydata$A)

This doesn't involve any fancy evaluation, you just need to specify the variable you'd like to subset in your function call.

This gives us:

[1] 1 2 3

Note that you're only printing the vector A, and not actually subsetting column A from mydata.

Upvotes: 1

Related Questions