jay.sf
jay.sf

Reputation: 72813

How to pass a formula as a parameter to lm in sapply?

It seems that lm won't take formula as a parameter when it is within a sapply.

Just lm

While lm alone accepts formula parameter FO well,

summary(lm(y ~ x, df1, df1[["z"]] == 1, df1[["w"]]))$coef[1, ]
summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ]

lm in sapply

the same within a sapply

sapply(unique(df1$z), function(s) 
  summary(lm(y ~ x, df1, df1[["z"]] == s, df1[[ws]]))$coef[1, ])
sapply(unique(data[[st]]), function(s) 
  summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ])

causes the error:

 Error in eval(substitute(subset), data, env) : object 's' not found 

When putting everything as parameter but formula FO it's still working:

sapply(unique(data[[st]]), function(s) 
  summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ])

lm in for loop

All parameters work within a for loop:

m <- matrix(NA, 4, length(unique(data[[st]])))
for (s in unique(data[[st]])) {
  m[, s] <- summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ]
}
m
#           [,1]       [,2]         [,3]
# [1,] 1.6269038 -0.1404174 -0.010338774
# [2,] 0.9042738  0.4577001  1.858138516
# [3,] 1.7991275 -0.3067890 -0.005564049
# [4,] 0.3229600  0.8104951  0.996457853

Data:

df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089, 0.363128411337339, 
0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894, 
-0.0946590384130976, 2.01842371387704), y = c(1.30824434809425, 
0.740171482827397, 2.64977380403845, -0.755998096151299, 0.125479556323628, 
-0.239445852485142, 2.14747239550901, -0.37891195982917, -0.638031707027734
), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8, 
1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame", row.names = c(NA, 
-9L))

FO <- y ~ x; data <- df1; st <- "z"; ws <- "w"; st1 <- 1

sessionInfo():

R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252   
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C                       
[5] LC_TIME=German_Switzerland.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.0 tools_3.6.0    yaj_0.0.0.9044 packrat_0.5.0 

Upvotes: 1

Views: 123

Answers (2)

jay.sf
jay.sf

Reputation: 72813

Thanks to a hint from @David from R-help to try using do.call I could figure it out. The solution is:

sapply(unique(data[[st]]), function(s)
  summary(do.call("lm", list(FO, data, data[[st]] == s, 
                             data[[ws]])))$coef[1, ])
#                 [,1]       [,2]         [,3]
# Estimate   1.6269038 -0.1404174 -0.010338774
# Std. Error 0.9042738  0.4577001  1.858138516
# t value    1.7991275 -0.3067890 -0.005564049
# Pr(>|t|)   0.3229600  0.8104951  0.996457853

Explanation: (credits to @Duncan from R-help) The caller of sapply might ignore the attached > environment(FO) # <environment: R_GlobalEnv> where the formula was created. This could be the reason why it works with do.call and an argument list.

Upvotes: 1

mcz
mcz

Reputation: 587

This worked when I tried it. It seems like your use of x in the formula is interfering with the way you'd like the function to behave. Replacing this argument with num generates the results it sounds like you're looking for. This way it ensures that x in the formula refers to the dataset instead of the function argument.

sapply(unique(dat$z), function(num) summary(lm(y ~ x, dat, z == num))$coef[1, ])

Upvotes: 1

Related Questions