David Graff
David Graff

Reputation: 31

Vectorize glm and extract all information

I have a data set "keywords" with several groups. I want to apply glm to each group individually to create a list of glm fits with one fit for each group.

I could do this with a for loop, but thats not in the R spirit. Instead, I tried to do it with a by function:

CTR.glm <- by(keywords,keywordsInSample,
          function(x) ifelse(nlevels(factor(x$AveragePosition))>20, # only these keywords will be fit
                             glm(Clicks ~ poly(log(AveragePosition),2) + offset(log(Impressions)),
                                 family = poisson,data = x),
                             NA)) # for functions that can't be fit

The problem is that whereas glm normally returns a glm-class object from which I can extract all sorts of goodies, by returns a list

> CTR.glm[2]
$`text of second keyword`
               (Intercept) poly(log(AveragePosition), 2)1 poly(log(AveragePosition), 2)2 
                 -3.626237                      -5.108795                      -1.751032 
> class(CTR.glm[2])
[1] "list"

All information has been lost except for the parameters of the fit. Is there a way to force by to keep all the information about the list?

p.s., I tried using the plyr toolbox, but it got stuck because my keywords have spaces in them.

p.p.s., this post should have the tag "by", but I can't create that tag (new to stackoverflow), could someone retag it?

Upvotes: 1

Views: 491

Answers (2)

AndrewMacDonald
AndrewMacDonald

Reputation: 2950

Try

lapply(CTR.glm,summary)

The list probably contains model objects, which still have the information you need

Upvotes: 2

Drew Steen
Drew Steen

Reputation: 16607

I think plyr should work just fine. I don't know the structure of your keywords and keywordsInSample, but consider that this toy example works fine:

require(plyr)
#generate some fake data, with a factor whose names have spaces in them
l <- c(rep("a a", 3), rep("a", 3), rep("b b", 3))
x <- rep(1:3, 3)
y <- rnorm(9)
d <- data.frame(keywordsInSample=grp, x=x, y=y)

lmList <- dlply(d, .(keywordsInSample), function(df) glm(df$y~df$x))
lmList$"a a"

As long as your index variable can be forced into a factor, R will internally represent it as numeric levels, and shouldn't care about what the names of the levels contain.

Upvotes: 0

Related Questions