Meadowstrain
Meadowstrain

Reputation: 21

write lsmeans into a function with "%>%"

Using "mpg" data as an example, I wrote some code to call lsmeans function in the non-function format, and the output is good to me (as shown below).

I try to modify the codes into a function format to generate the same output, but the columns of the data could not be recognized in the function I wrote.

Errors reported from my function is:

Error in eval(predvars, data, env) : object 'cty' not found

My codes not in the function format that work fine:

library(rlang)
library(tidyverse)
library(dplyr)
library(multcompView)
library(lsmeans)


model = lm(cty ~ drv + class + drv:class,data=mpg)
anova(model)

marginal = lsmeans(model,~drv:class)

Pletters = multcomp::cld(marginal,
          alpha=0.05,
          Letters=letters,
          adjust="tukey")
Pletters$.group=gsub(" ", "", Pletters$.group)
Pletters

Function code I wrote that did not work for me:

library(rlang)
library(tidyverse)
library(dplyr)
library(multcompView)
library(lsmeans)

P_letters<-function(data, y, groupby, subgroupby){

model = lm(y ~ groupby + subgroupby + groupby:subgroupby,data=data)
anova(model)

marginal = lsmeans(model,~groupby:subgroupby)

Pletters = multcomp::cld(marginal,
          alpha=0.05,
          Letters=letters,
          adjust="tukey")
Pletters$.group=gsub(" ", "", Pletters$.group)
Pletters
}

Call the function with "mpg" data:

result<-mpg %>%  
P_letters(y=cty, groupby=drv, subgroupby=class)
result

Output from the non-function format codes:

nalysis of Variance Table

Response: cty
           Df  Sum Sq Mean Sq  F value Pr(>F)    
drv         2 1878.81  939.41 136.6198 <2e-16 ***
class       6  804.78  134.13  19.5069 <2e-16 ***
drv:class   3   10.26    3.42   0.4974 0.6844    
Residuals 222 1526.49    6.88                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 drv class      lsmean    SE  df lower.CL upper.CL .group
 r   suv          12.0 0.791 222     9.58     14.4 a     
 4   pickup       13.0 0.456 222    11.60     14.4 a     
 4   suv          13.8 0.367 222    12.70     14.9 a     
 r   2seater      15.4 1.173 222    11.80     19.0 ab    
 f   minivan      15.8 0.791 222    13.39     18.2 ab    
 r   subcompact   15.9 0.874 222    13.21     18.6 ab    
 4   midsize      16.0 1.514 222    11.36     20.6 abc   
 4   compact      18.0 0.757 222    15.68     20.3 bc    
 f   midsize      19.0 0.425 222    17.67     20.3 bc    
 4   subcompact   19.5 1.311 222    15.48     23.5 bcd   
 f   compact      20.9 0.443 222    19.50     22.2 cd    
 f   subcompact   22.4 0.559 222    20.65     24.1 d     
 4   2seater    nonEst    NA  NA       NA       NA       
 f   2seater    nonEst    NA  NA       NA       NA       
 r   compact    nonEst    NA  NA       NA       NA       
 r   midsize    nonEst    NA  NA       NA       NA       
 4   minivan    nonEst    NA  NA       NA       NA       
 r   minivan    nonEst    NA  NA       NA       NA       
 f   pickup     nonEst    NA  NA       NA       NA       
 r   pickup     nonEst    NA  NA       NA       NA       
 f   suv        nonEst    NA  NA       NA       NA       

Confidence level used: 0.95 
Conf-level adjustment: sidak method for 21 estimates 
P value adjustment: tukey method for comparing a family of 21 estimates 
significance level used: alpha = 0.05 

Error from my function format code:

Error in eval(predvars, data, env) : object 'cty' not found

Upvotes: 2

Views: 252

Answers (1)

Artem Sokolov
Artem Sokolov

Reputation: 13701

Some R functions operate directly on the expressions that a user writes, instead of evaluating them. It's a fairly advanced concept known as non-standard evaluation (NSE), and you can learn more about it in the recent tidy evaluation guide. As a very brief example of NSE, consider the library() function:

library("tidyverse")              # Works

a <- "tidyverse"
library( a )
# Error in library(a) : there is no package called ‘a’

If library() used standard evaluation rules, the expression a would have been evaluated to produce the value "tidyverse", which would then be passed to library(). However, because of NSE, the function works directly with a, instead of evaluating it.

The functions you are trying to use fall under the same umbrella and require special treatment to handle user-supplied expressions. The rlang package provides several mechanisms for this. In particular, we will use 1) ensym() to capture the variable name supplied to the function; 2) expr() to compose an unevaluated expression; 3) !! operator to force expression evaluation.

P_letters <- function(data, y, groupby, subgroupby) {
  ## Compose a formula expression using user-supplied variable names
  frml <- expr( !!ensym(y) ~ !!ensym(groupby) + !!ensym(subgroupby) +
                    !!ensym(groupby):!!ensym(subgroupby) )

  ## Pass the formula expression to lm()
  model <- lm(frml,data=data)
  print(anova(model))

  ## Compose and evaluate an expression for lsmean() call
  lsm_expr <- expr( lsmeans(model,~!!ensym(groupby):!!ensym(subgroupby)) )
  marginal <- eval(lsm_expr)

  Pletters <- multcomp::cld(marginal, alpha=0.05,
                          Letters=letters, adjust="tukey")
  Pletters$.group=gsub(" ", "", Pletters$.group)
  Pletters
}

## This now produces the desired output
mpg %>% P_letters( y=cty, groupby=drv, subgroupby=class ) 

Upvotes: 0

Related Questions