Clustered errors in panel data in Multinomial logistic regression in python (e.g. statsmodels package) sandwich estimator

Question

I am estimating a MNL regression model for customer choices. I have a panel data set with N = \sum_{i=1}^{I} t_{ ext{observations}}. Since one individual makes multiple choices in my dataset. Standard errors should be adjusted for clustering. I use the statsmodels/statsmodels/discrete /discrete_model.py MNLogit method. However, even though other methods do provide clustering robust standard errors, I cannot find how to implement it for the MNL model. I tried doing it myself by creating a sandwich estimator using this code:

Can someone please help me by if this makes sense in case of this model? If not what is the best way to do it?

Python code:

X = df[varnames].values
y = df['CHOICE'].values


X = sm.add_constant(X, prepend=False)
mnlogit_mod = sm.MNLogit(y, X)
mnlogit_fit = mnlogit_mod.fit()

#####
print("Log-likelihood for each observation:")
loglike_obs = mnlogit_fit.model.loglikeobs(mnlogit_fit.params)
print(loglike_obs)
print("Shape of log-likelihood for each observation:", loglike_obs.shape)

print("
Score/gradient vector for each variable: ndarray, (K * (J-1),)")
score = mnlogit_fit.model.score(mnlogit_fit.params)
print(score)
print("Shape of score/gradient vector:", score.shape)

print("
Loglik and score matrix:")
loglike_and_score = mnlogit_fit.model.loglike_and_score(mnlogit_fit.params)
print(loglike_and_score)
print("Shape of loglik and score matrix:", loglike_and_score[0].shape, loglike_and_score[1].shape)

print("
Hessian:")
Hessian = mnlogit_fit.model.hessian(mnlogit_fit.params)
print(Hessian)
print("Shape of Hessian matrix:", Hessian.shape)

print("
Jacobian:")
Jacobian = mnlogit_fit.model.score_obs(mnlogit_fit.params)
print(Jacobian)
print("Shape of Jacobian matrix:", Jacobian.shape)

print("
Estimated parameters:")
print(mnlogit_fit.params)
print("Shape of estimated parameters:", mnlogit_fit.params.shape)
###

#Here i try to adjust the standard errors

clusters = df['KEY_CUSTOMER']
cluster_ids = np.unique(clusters)
clustered_scores = np.zeros((len(cluster_ids), Jacobian.shape[1]))


for i, cluster in enumerate(cluster_ids):
    clustered_scores[i] = Jacobian[clusters == cluster].sum(axis=0)

OPG_sum = np.zeros((Jacobian.shape[1], Jacobian.shape[1]))

for score in clustered_scores:
    OPG_sum += np.outer(score, score)

H_inv = inv(Hessian)
robust_cov = H_inv @ OPG_sum @ H_inv

# Standard errors
robust_se = np.sqrt(np.diag(robust_cov))

print("Robust Standard Errors:", robust_se)

Clustered errors in panel data in Multinomial logistic regression in python (e.g. statsmodels package) sandwich estimator

Answers (0)

Related Questions