Reputation: 1561
I am running MNLogit (multinomial logistic regression) as follows:
from statsmodels.formula.api import MNLogit
model=MNLogit.from_formula("y ~ x", df).fit()
model.summary()
The variable y is categorical and seems to be automatically dummy encoded by the MNLogit function. The summary output gives a row for each category of y except for the reference category.
1) How can I get the identity of the reference category? (It is tedious to figure this out manually due to the many categories for y)
2) As there is no z or P>|z| (p-value) given for the reference category, how can I assess significance for the reference category?
3) How can I change which category is treated as the reference category?
Upvotes: 0
Views: 2316
Reputation: 759
I believe with statmodels MNLogit the 1st variable in a string sorted listing of your possible y variables always used as the referent. You can check the first variable by using model.model._ynames_map
in your example. This will return a dictionary and the value with the 0
key should be the one used for the referent.
This site provides some information how to interpret the referent. I won't belabor the point by retyping it. It is not in python, but the tenants of statistics hold across languages.
As the first sorted response serves as the referent I believe you would have to change the response to what you wanted by adding 'AAAAA' or something similar to the response to make sure it appears first in the listing, but that is unnecessary once you are able to know which one the referent is and adjust your conclusions wording as needed.
Upvotes: 1
Reputation: 10375
Upvotes: 0