user1392795
user1392795

Reputation:

Applying by function to lm()

I am new to R and I am just learning about the apply functions and how they work. I simply want to extract the coefficients from a lm fit on a variable x by product color and brand across a several years.

I know I can create a for loop and subset the data by model year and fit it, but I think its time I start using more built in functions, so I want to be able to do it with the by function or one of the apply functions. Here is what I was thinking.

#some made up data

x<-rnorm(50,13400,1200)
color<-sample(factor(c("Red","Black","Blue","Green","White")),50,replace=T)
year<-sample(factor(2006:2012),50,replace=T)
brand<-sample(factor(c("A","B","C","D")),50,replace=T)

d<-data.frame(x,color,year,brand)

#now I want to fit the model lm(x~color+brand) for each year level
#this is what I was thinking...

tmp<-with(d,by(x,year,function(y) lm(x~color,data=y)))
sapply(tmp,coef)

Error in eval(predvars, data, env) : numeric 'envir' arg not of length one

I am basing this off the exapmle R gave when I entered help(by)

Upvotes: 5

Views: 1926

Answers (3)

MichaelChirico
MichaelChirico

Reputation: 34703

This is much easier in more modern packages, e.g. data.table:

library(data.table)
setDT(d)
d[ , .(reg = list(lm(x ~ color))), by = year]
#    year  reg
# 1: 2012 <lm>
# 2: 2006 <lm>
# 3: 2011 <lm>
# 4: 2008 <lm>
# 5: 2007 <lm>
# 6: 2010 <lm>
# 7: 2009 <lm>

The reg column has lm objects; note we need to wrap lm in list(.) so that data.table doesn't confuse the plain list (note that is.list(lm(x ~ color, data = d)) is TRUE.

Upvotes: 2

joran
joran

Reputation: 173547

In addition to doing this with base R functions (which is a good idea to learn how they work) it's also worth looking at who may have worked on this before. Fitting linear models to each subset of a data set is something that comes up a bunch, and it turns out people have built some convenient tools specifically for this.

Try looking at the package lme4 and the function lmList. From the example in ?lmList,

(fm1 <- lmList(Reaction ~ Days | Subject, sleepstudy))

which fits the linear model Reaction ~ Days separately for each Subject. This is conveient because extractor methods work directly on fm1:

> coef(fm1)
    (Intercept)      Days
308    244.1927 21.764702
309    205.0549  2.261785
310    203.4842  6.114899
330    289.6851  3.008073
331    285.7390  5.266019
332    264.2516  9.566768
333    275.0191  9.142045
334    240.1629 12.253141
335    263.0347 -2.881034
337    290.1041 19.025974
349    215.1118 13.493933
350    225.8346 19.504017
351    261.1470  6.433498
352    276.3721 13.566549
369    254.9681 11.348109
370    210.4491 18.056151
371    253.6360  9.188445
372    267.0448 11.298073

(the row numbers are the id values for the Subjects) See the help file at ?lmList to see what other methods are available for things like confidence intervals, plotting, etc.

Upvotes: 5

Matthew Plourde
Matthew Plourde

Reputation: 44614

Try this instead:

tmp <- by(d, year, function(d.subset) lm(x~color, data=d.subset))

Upvotes: 6

Related Questions