Reputation:
I am new to R and I am just learning about the apply
functions and how they work. I simply want to extract the coefficients from a lm
fit on a variable x by product color and brand across a several years.
I know I can create a for loop and subset the data by model year and fit it, but I think its time I start using more built in functions, so I want to be able to do it with the by function or one of the apply functions. Here is what I was thinking.
#some made up data
x<-rnorm(50,13400,1200)
color<-sample(factor(c("Red","Black","Blue","Green","White")),50,replace=T)
year<-sample(factor(2006:2012),50,replace=T)
brand<-sample(factor(c("A","B","C","D")),50,replace=T)
d<-data.frame(x,color,year,brand)
#now I want to fit the model lm(x~color+brand) for each year level
#this is what I was thinking...
tmp<-with(d,by(x,year,function(y) lm(x~color,data=y)))
sapply(tmp,coef)
Error in eval(predvars, data, env) : numeric 'envir' arg not of length one
I am basing this off the exapmle R gave when I entered help(by)
Upvotes: 5
Views: 1926
Reputation: 34703
This is much easier in more modern packages, e.g. data.table
:
library(data.table)
setDT(d)
d[ , .(reg = list(lm(x ~ color))), by = year]
# year reg
# 1: 2012 <lm>
# 2: 2006 <lm>
# 3: 2011 <lm>
# 4: 2008 <lm>
# 5: 2007 <lm>
# 6: 2010 <lm>
# 7: 2009 <lm>
The reg
column has lm
objects; note we need to wrap lm
in list(.)
so that data.table
doesn't confuse the plain list (note that is.list(lm(x ~ color, data = d))
is TRUE
.
Upvotes: 2
Reputation: 173547
In addition to doing this with base R functions (which is a good idea to learn how they work) it's also worth looking at who may have worked on this before. Fitting linear models to each subset of a data set is something that comes up a bunch, and it turns out people have built some convenient tools specifically for this.
Try looking at the package lme4 and the function lmList
. From the example in ?lmList
,
(fm1 <- lmList(Reaction ~ Days | Subject, sleepstudy))
which fits the linear model Reaction ~ Days
separately for each Subject
. This is conveient because extractor methods work directly on fm1
:
> coef(fm1)
(Intercept) Days
308 244.1927 21.764702
309 205.0549 2.261785
310 203.4842 6.114899
330 289.6851 3.008073
331 285.7390 5.266019
332 264.2516 9.566768
333 275.0191 9.142045
334 240.1629 12.253141
335 263.0347 -2.881034
337 290.1041 19.025974
349 215.1118 13.493933
350 225.8346 19.504017
351 261.1470 6.433498
352 276.3721 13.566549
369 254.9681 11.348109
370 210.4491 18.056151
371 253.6360 9.188445
372 267.0448 11.298073
(the row numbers are the id values for the Subjects
) See the help file at ?lmList
to see what other methods are available for things like confidence intervals, plotting, etc.
Upvotes: 5
Reputation: 44614
Try this instead:
tmp <- by(d, year, function(d.subset) lm(x~color, data=d.subset))
Upvotes: 6