Reputation: 6289
I have just started using R. An example of linear regression looks like this:
lm(y ~ x1 + x2 + x3, data)
It appears that formulas passed to the lm function can contain variable names that are not in scope. How does this work? How is the formula interpreted by R?
I have already tried reading the source code of lm, but couldn't make any sense of it.
Upvotes: 2
Views: 988
Reputation: 206566
When you pass a formula and a data= parameter, lm will try to resolve the variable names in the supplied data.frame first. So y,x1,x2 and x3 should be names of columns in data. If not found in the data.frame, they are searched for in the current environment.
#example
x9<-runif(15)
data=data.frame(x1=runif(15), x2=rnorm(15))
data<-transform(data, y=3*x1-2*x9-2+rnorm(15))
#here y,x1,x2 are resolved within data, and x9 comes form the current env
lm(y~x1+x2+x9, data)
Upvotes: 3