Reputation: 437
I have a dataset called bjmd
that looks like this (simplified):
rte year y obs
22037 46001 1 0 1
22042 46001 2 4 3
22047 46001 3 5 3
22202 46002 1 11 1
22207 46002 2 14 1
22212 46002 3 6 1
22140 46003 1 5 6
22141 46003 2 2 6
22142 46003 3 6 6
I want to run a loop to conduct a glm
analysis for each distinct rte
(46001,46002, 46003). Within each rte
, there are multiple year
s and they all need to be included in the glm
analysis. From each route's glm
test, I am taking the slope and creating another table with route and slope as columns. This is what I want it to look like:
rte slope
46001 x
46002 y
46003 z
Here is the for loop code I came up with:
route<-with(bjmd,unique(rte))
slope<-with(bjmd,numeric(length(unique(rte))))
table<-data.frame(route,slope)
for (i in unique(as.factor(bjmd$rte))) {
data<-subset(bjmd, rte=='i')
slope[i] <- coef(summary(glm(y ~ year+obs,
family = poisson(link=log),data=data)))[2,1]
table[i,2] <-paste(slope[i])
})
table
Something is wrong with this code as I keep getting 0 values for my slope:
route slope
1 46001 0
2 46002 0
3 46003 0
Can somebody please help point out where I am messing it up?
Upvotes: 0
Views: 123
Reputation: 57686
No looping is needed; just use split
to split your dataset into groups according to rte
. Then fit a model to each group with lapply
.
lapply(split(bjmd, bjmd$rte), function(dat) glm(y ~ year + obs, data=dat))
You could also model everything in one go, with an interaction term. The predicted values will be the same, but the residual deviance, df and hence P-values will be different. Which approach is better suited to your needs depends on your project.
glm(y ~ (year + obs) * factor(rte), data=bjmd)
Upvotes: 1