wallflower
wallflower

Reputation: 437

How to loop with repeated values within a variable?

I have a dataset called bjmd that looks like this (simplified):

      rte   year   y  obs
22037 46001  1     0   1
22042 46001  2     4   3
22047 46001  3     5   3
22202 46002  1    11   1
22207 46002  2    14   1
22212 46002  3     6   1
22140 46003  1     5   6
22141 46003  2     2   6
22142 46003  3     6   6

I want to run a loop to conduct a glm analysis for each distinct rte (46001,46002, 46003). Within each rte, there are multiple years and they all need to be included in the glm analysis. From each route's glm test, I am taking the slope and creating another table with route and slope as columns. This is what I want it to look like:

rte    slope
46001   x
46002   y
46003   z

Here is the for loop code I came up with:

route<-with(bjmd,unique(rte))
slope<-with(bjmd,numeric(length(unique(rte))))
table<-data.frame(route,slope)
for (i in unique(as.factor(bjmd$rte))) {
  data<-subset(bjmd, rte=='i')
  slope[i] <- coef(summary(glm(y ~  year+obs,
                               family = poisson(link=log),data=data)))[2,1]
  table[i,2] <-paste(slope[i])
})
table

Something is wrong with this code as I keep getting 0 values for my slope:

  route slope
1 46001     0
2 46002     0
3 46003     0

Can somebody please help point out where I am messing it up?

Upvotes: 0

Views: 123

Answers (1)

Hong Ooi
Hong Ooi

Reputation: 57686

No looping is needed; just use split to split your dataset into groups according to rte. Then fit a model to each group with lapply.

lapply(split(bjmd, bjmd$rte), function(dat) glm(y ~ year + obs, data=dat))

You could also model everything in one go, with an interaction term. The predicted values will be the same, but the residual deviance, df and hence P-values will be different. Which approach is better suited to your needs depends on your project.

glm(y ~ (year + obs) * factor(rte), data=bjmd)

Upvotes: 1

Related Questions