Reputation: 1561

Python statsmodels: Regression summary, how to get p-value for reference dummy variable?

I am running MNLogit (multinomial logistic regression) as follows:

from statsmodels.formula.api import MNLogit
model=MNLogit.from_formula("y ~ x", df).fit()
model.summary()

The variable y is categorical and seems to be automatically dummy encoded by the MNLogit function. The summary output gives a row for each category of y except for the reference category.

1) How can I get the identity of the reference category? (It is tedious to figure this out manually due to the many categories for y)

2) As there is no z or P>|z| (p-value) given for the reference category, how can I assess significance for the reference category?

3) How can I change which category is treated as the reference category?

Upvotes: 0

Answers (2)

jtweeder

Reputation: 759

I believe with statmodels MNLogit the 1st variable in a string sorted listing of your possible y variables always used as the referent. You can check the first variable by using model.model._ynames_map in your example. This will return a dictionary and the value with the 0 key should be the one used for the referent.
This site provides some information how to interpret the referent. I won't belabor the point by retyping it. It is not in python, but the tenants of statistics hold across languages.
As the first sorted response serves as the referent I believe you would have to change the response to what you wanted by adding 'AAAAA' or something similar to the response to make sure it appears first in the listing, but that is unnecessary once you are able to know which one the referent is and adjust your conclusions wording as needed.

Upvotes: 1

user2974951

Reputation: 10375

The intercept term is the result of your reference level (that is the "missing" category), you can check what your reference is by checking the first level of the variable
The test statistic and p-value are in the Intercept term for the reference category
Relevel your categorical variable, optionally you can use different contrast treatments to set what kind of contrasts you want.

Upvotes: 0

Python statsmodels: Regression summary, how to get p-value for reference dummy variable?

Answers (2)

Related Questions