Reputation: 49
This is my data head(both):
season gender age prog grade
fall woman old FRIST B
fall woman old FRIST A
spring woman old FRIST E
spring man old NMATK C
spring woman old NFYSK A
fall woman old FRIST E
I want to do logistic regression where grades are response variable. I want to make four of which are independent.
Here:
E/A+B+C+D=alpha_1+beta^x_1+beta^y_1+...
D+E/A+B+C=alpha_2+beta^x_2+beta^y_2+...
C+D+E/A+B=alpha_3+beta^x_3+beta^y_3+...
B+C+D+E/A=alpha_4+beta^x_4+beta^y_4+...
What i have done:
library(MASS)
y <- factor(both$betyg)
mod.fit <- polr(y ~ prog + gender + age + season, data=both, Hess=TRUE)
summary(mod.fit)
Then I get this message :
Warning message: In polr(y ~ prog + gender + age + season, data = both, Hess = TRUE) : design appears to be rank-deficient, so dropping some coefs
I know this is not an error just a warning. I Do not know how to interpret it or what to do differently to avoid this message?
Upvotes: 1
Views: 4667
Reputation: 15773
Since your outcome is ordered, you'll probably do better with ordinal, but may want to check the proportional odds assumption. The model you're describing is pretty much what polr
does, though they're not independent as you say. UCLA has a good tutorial on this.
As for determining which model is best, when dealing with fundamentally different types of models like these, I'd recommend cross-validation. Prediction accuracy doesn't lie, and any pseudo-R^2 metrics will differ in interpretation across models.
Also, since this question concerns statistics more than R coding/implementation, I'd recommend CrossValidated (the stats StackExchange site).
Upvotes: 1