Noah
Noah

Reputation: 575

Regression for a Rate variable in R

I was tasked with developing a regression model looking at student enrollment in different programs. This is a very nice, clean data set where the enrollment counts follow a Poisson distribution well. I fit a model in R (using both GLM and Zero Inflated Poisson.) The resulting residuals seemed reasonable.

However, I was then instructed to change the count of students to a "rate" which was calculated as students / school_population (Each school has its own population.)) This is now no longer a count variable, but a proportion between 0 and 1. This is considered the "proportion of enrollment" in a program.

This "rate" (students/population) is no longer Poisson, but is certainly not normal either. So, I'm a bit lost as to the appropriate distribution, and subsequent model to represent it.

A log normal distribution seems to fit this rate parameter well, however I have many 0 values, so it won't actually fit.

Any suggestions on the best form of distribution for this new parameter, and how to model it in R?

Thanks!

Upvotes: 5

Views: 3815

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226761

As suggested in the comments you could keep the Poisson model and do it with an offset:

glm(response~predictor1+predictor2+predictor3+ ... + offset(log(population),
     family=poisson,data=...)

Or you could use a binomial GLM, either

glm(cbind(response,pop_size-response) ~ predictor1 + ... , family=binomial,
        data=...)

or

glm(response/pop_size ~ predictor1 + ... , family=binomial,
        weights=pop_size,
        data=...)

The latter form is sometimes more convenient, although less widely used. Be aware that in general switching from Poisson to binomial will change the link function from log to logit, although you can use family=binomial(link="log")) if you prefer.

Zero-inflation might be easier to model with the Poisson + offset combination (I'm not sure if the pscl package, the most common approach to ZIP, handles offsets, but I think it does), which will be more commonly available than a zero-inflated binomial model.

I think glmmADMB will do a zero-inflated binomial model, but I haven't tested it.

Upvotes: 7

Related Questions