Reputation: 1
I'm trying to expand a replication dataset to estimate the multinomial logit model of a categorical outcome. But the mlogit.data function is coercing my categorical outcome variable with values of 0, 1, and 2 to a true/false binary, which I think is screwing up my modeling.
When I run summary(data_filtered)
, my baseline dataframe, I can still see the full range of values for M_RUF, my primary outcome variable.
X M_RUF RUF2 CDF MUD EDUCATION
Min. : 2.0 Min. :0.0000 Min. :0.0000 Min. :0 Min. :0.0000 Min. :0.0000
1st Qu.: 197.8 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0 1st Qu.:0.0000 1st Qu.:0.0000
Median : 570.5 Median :1.0000 Median :1.0000 Median :0 Median :1.0000 Median :1.0000
Mean : 536.0 Mean :0.7591 Mean :0.6757 Mean :0 Mean :0.6304 Mean :0.8098
3rd Qu.: 821.0 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0 3rd Qu.:1.0000 3rd Qu.:2.0000
Max. :1227.0 Max. :2.0000 Max. :1.0000 Max. :0 Max. :1.0000 Max. :2.0000
SLPP Mende NOPARTY EXC_SURV_LOC id
Min. :0.000 Min. :0.000 Min. :0.000 Min. : 1.00 Min. : 1.0
1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:27.75 1st Qu.: 141.8
Median :0.000 Median :1.000 Median :1.000 Median :42.00 Median : 4007.5
Mean :0.212 Mean :0.538 Mean :0.529 Mean :41.06 Mean : 4873.6
3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:56.00 3rd Qu.: 8036.2
Max. :1.000 Max. :1.000 Max. :1.000 Max. :66.00 Max. :14086.0
But that gets screwy once I introduce the mlogit.data function, coercing my M_RUF variable into a logical binary.
data_long <- mlogit.data(data_filtered, shape = "wide", choice = "M_RUF", id.var = "id")
summary(data_long)
X M_RUF RUF2 CDF MUD EDUCATION
Min. : 2.0 Mode :logical Min. :0.0000 Min. :0 Min. :0.0000 Min. :0.0000
1st Qu.: 197.8 FALSE:1104 1st Qu.:0.0000 1st Qu.:0 1st Qu.:0.0000 1st Qu.:0.0000
Median : 570.5 TRUE :552 Median :1.0000 Median :0 Median :1.0000 Median :1.0000
Mean : 536.0 Mean :0.6757 Mean :0 Mean :0.6304 Mean :0.8098
3rd Qu.: 821.0 3rd Qu.:1.0000 3rd Qu.:0 3rd Qu.:1.0000 3rd Qu.:2.0000
Max. :1227.0 Max. :1.0000 Max. :0 Max. :1.0000 Max. :2.0000
SLPP Mende NOPARTY EXC_SURV_LOC id
Min. :0.000 Min. :0.000 Min. :0.000 Min. : 1.00 Min. : 1.0
1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:27.75 1st Qu.: 141.8
Median :0.000 Median :1.000 Median :1.000 Median :42.00 Median : 4007.5
Mean :0.212 Mean :0.538 Mean :0.529 Mean :41.06 Mean : 4873.6
3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:56.00 3rd Qu.: 8036.2
Max. :1.000 Max. :1.000 Max. :1.000 Max. :66.00 Max. :14086.0
chid alt idx.chid idx.alt
Min. : 2.0 0:552 Min. : 2.0000 0:552
1st Qu.: 191.8 1:552 1st Qu.: 191.7500 1:552
Median : 542.5 2:552 Median : 542.5000 2:552
Mean : 503.7 Mean : 503.6975 NA
3rd Qu.: 770.0 3rd Qu.: 770.0000 NA
Max. :1103.0 Max. :1103.0000 NA
When I try to run my model, I get a correlation error, which doesn't make sense given the variation I observe in my pre-mlogit.data dataframe.
mlogit_model <- mlogit(M_RUF ~ MUD + EDUCATION + SLPP + Mende + NOPARTY | 0, data = data_long)
Error in solve.default(H, g[!fixed]) : Lapack routine dgesv: system is exactly singular: U[1,1] = 0
Upvotes: 0
Views: 25