Reputation: 6921
I want to use a logistic regression to actually perform regression and not classification.
My response variable is numeric between 0 and 1 and not categorical. This response variable is not related to any kind of binomial process. In particular, there is no "success", no "number of trials", etc. It is simply a real variable taking values between 0 and 1 depending on circumstances.
Here is a minimal example to illustrate what I want to achieve
dummy_data <- data.frame(a=1:10,
b=factor(letters[1:10]),
resp = runif(10))
fit <- glm(formula = resp ~ a + b,
family = "binomial",
data = dummy_data)
This code gives a warning then fails because I am trying to fit the "wrong kind" of data:
In eval(family$initialize) : non-integer #successes in a binomial glm!
Yet I think there must be a way since the help of family
says:
For the binomial and quasibinomial families the response can be specified in one of three ways: [...] (2) As a numerical vector with values between 0 and 1, interpreted as the proportion of successful cases (with the total number of cases given by the weights).
Somehow the same code works using "quasibinomial"
as the family which makes me think there may be a way to make it work with a binomial glm.
I understand the likelihood is derived with the assumption that $y_i$ is in ${0, 1}$ but, looking at the maths, it seems like the log-likelihood still makes sense with $y_i$ in $[0, 1]$. Am I wrong?
Upvotes: 1
Views: 4749
Reputation: 2381
From the discussion at Warning: non-integer #successes in a binomial glm! (survey packages), I think we can solve it by another family function ?quasibinomial()
.
dummy_data <- data.frame(a=1:10,
b=factor(letters[1:10]),
resp = runif(10),w=round(runif(10,1,11)))
fit2 <- glm(formula = resp ~ a + b,
family = quasibinomial(),
data = dummy_data, weights = w)
Upvotes: 3
Reputation: 1087
This is because you are using the binomial family and giving the wrong output. Since the family chosen is binomial, this means that the outcome has to be either 0 or 1, not the probability value.
This code works fine, because the response is either 0 or 1.
dummy_data <- data.frame(a=1:10,
b=factor(letters[1:10]),
resp = sample(c(0,1),10,replace=T,prob=c(.5,.5)) )
fit <- glm(formula = resp ~ a + b,
family = binomial(),
data = dummy_data)
If you want to model the probability directly you should include an additional column with the total number of cases. In this case the probability you want to model is interpreted as the success rate given the number of case in the weights column.
dummy_data <- data.frame(a=1:10,
b=factor(letters[1:10]),
resp = runif(10),w=round(runif(10,1,11)))
fit <- glm(formula = resp ~ a + b,
family = binomial(),
data = dummy_data, weights = w)
You will still get the warning message, but you can ignore it, given these conditions:
resp
is the proportion of 1's in n trials.
for each value in resp
, the corresponding value in w
is the number of trials.
Upvotes: 2