spx31
spx31

Reputation: 133

Finalfit in R doesn't calculate multivariable logression coefficients for some categorical variables

I have a tibble which has 2 columns of 200 rows. The first row is OS for Overall Survival in Days and the other row is Sex for either "m" or "f". There are no missing values anywhere.

Now if I want to do a multivariable logression with finalfit like this:

library(tidyverse)
library(finalfit)

fit <- data %>% 
    finalfit("OS","Sex",metrics = TRUE)

and get the following output:

[[1]]
 Dependent: OS        unit         value         Coefficient (univariable)       Coefficient (multivariable)
   Sex f Mean (sd)    568.8 (380.6)                         -                -
   m Mean (sd) 601.5 (378.0) 32.75 (-77.03 to 142.52, p=0.557) 32.75 (-77.03 to 142.52, p=0.557)

So it basically produces no coefficient for the female sex, only for male sex. I have this problem with more categorical variables, too, where the function returns no coefficient for a certain value of a categorical variable.

I do not have this problem with continous variables, e.g. using Age instead of Sex.

I am at a loss as to why this is. Any help on where to start debugging is appreciated.

Upvotes: 0

Views: 122

Answers (1)

DaveArmstrong
DaveArmstrong

Reputation: 21757

This is the desired behaviour of the model. Let's say you want to predict y with sex which has values m and f. What you're looking for is a coefficient for both m and f. That would give you a design matrix that looks like:

Intercept m f
        1 1 0
        1 0 1
...

You can see that Intercept, m and f are perfectly collinear because f = Intercept-m, f is a perfect linear transformation of Intercept and m. That means there is no unique solution to find all three coefficients. The model you're estimating with finalfit() is providing you with the relevant information.

Upvotes: 1

Related Questions