Reputation: 133
I have a tibble which has 2 columns of 200 rows. The first row is OS for Overall Survival in Days and the other row is Sex for either "m" or "f". There are no missing values anywhere.
Now if I want to do a multivariable logression with finalfit like this:
library(tidyverse)
library(finalfit)
fit <- data %>%
finalfit("OS","Sex",metrics = TRUE)
and get the following output:
[[1]]
Dependent: OS unit value Coefficient (univariable) Coefficient (multivariable)
Sex f Mean (sd) 568.8 (380.6) - -
m Mean (sd) 601.5 (378.0) 32.75 (-77.03 to 142.52, p=0.557) 32.75 (-77.03 to 142.52, p=0.557)
So it basically produces no coefficient for the female sex, only for male sex. I have this problem with more categorical variables, too, where the function returns no coefficient for a certain value of a categorical variable.
I do not have this problem with continous variables, e.g. using Age instead of Sex.
I am at a loss as to why this is. Any help on where to start debugging is appreciated.
Upvotes: 0
Views: 122
Reputation: 21757
This is the desired behaviour of the model. Let's say you want to predict y
with sex
which has values m
and f
. What you're looking for is a coefficient for both m
and f
. That would give you a design matrix that looks like:
Intercept m f
1 1 0
1 0 1
...
You can see that Intercept
, m
and f
are perfectly collinear because f = Intercept-m
, f
is a perfect linear transformation of Intercept
and m
. That means there is no unique solution to find all three coefficients. The model you're estimating with finalfit()
is providing you with the relevant information.
Upvotes: 1