asachet
asachet

Reputation: 6921

Fit binomial GLM on probabilities (i.e. using logistic regression for regression not classification)

I want to use a logistic regression to actually perform regression and not classification.

My response variable is numeric between 0 and 1 and not categorical. This response variable is not related to any kind of binomial process. In particular, there is no "success", no "number of trials", etc. It is simply a real variable taking values between 0 and 1 depending on circumstances.

Here is a minimal example to illustrate what I want to achieve

dummy_data <- data.frame(a=1:10, 
                         b=factor(letters[1:10]), 
                         resp = runif(10))
fit <- glm(formula = resp ~ a + b, 
           family = "binomial",
           data = dummy_data)

This code gives a warning then fails because I am trying to fit the "wrong kind" of data:

In eval(family$initialize) : non-integer #successes in a binomial glm!

Yet I think there must be a way since the help of family says:

For the binomial and quasibinomial families the response can be specified in one of three ways: [...] (2) As a numerical vector with values between 0 and 1, interpreted as the proportion of successful cases (with the total number of cases given by the weights).

Somehow the same code works using "quasibinomial" as the family which makes me think there may be a way to make it work with a binomial glm.

I understand the likelihood is derived with the assumption that $y_i$ is in ${0, 1}$ but, looking at the maths, it seems like the log-likelihood still makes sense with $y_i$ in $[0, 1]$. Am I wrong?

Upvotes: 1

Views: 4749

Answers (2)

Shixiang Wang
Shixiang Wang

Reputation: 2381

From the discussion at Warning: non-integer #successes in a binomial glm! (survey packages), I think we can solve it by another family function ?quasibinomial().

dummy_data <- data.frame(a=1:10, 
                         b=factor(letters[1:10]), 
                         resp = runif(10),w=round(runif(10,1,11)))

fit2 <- glm(formula = resp ~ a + b, 
           family = quasibinomial(),
           data = dummy_data, weights = w)

enter image description here

Upvotes: 3

Marco De Virgilis
Marco De Virgilis

Reputation: 1087

This is because you are using the binomial family and giving the wrong output. Since the family chosen is binomial, this means that the outcome has to be either 0 or 1, not the probability value.

This code works fine, because the response is either 0 or 1.

dummy_data <- data.frame(a=1:10, 
                         b=factor(letters[1:10]), 
                         resp = sample(c(0,1),10,replace=T,prob=c(.5,.5)) )

fit <- glm(formula = resp ~ a + b, 
           family = binomial(),
           data = dummy_data)

If you want to model the probability directly you should include an additional column with the total number of cases. In this case the probability you want to model is interpreted as the success rate given the number of case in the weights column.

 dummy_data <- data.frame(a=1:10, 
                         b=factor(letters[1:10]), 
                         resp = runif(10),w=round(runif(10,1,11)))

fit <- glm(formula = resp ~ a + b, 
           family = binomial(),
           data = dummy_data, weights = w)

You will still get the warning message, but you can ignore it, given these conditions:

  1. resp is the proportion of 1's in n trials.

  2. for each value in resp, the corresponding value in w is the number of trials.

Upvotes: 2

Related Questions