mmann1123
mmann1123

Reputation: 5295

Error with levels using mlogit in R

I am having some trouble with levels... Running the following:

library(mlogit)

panel.datasm = data.frame(
    cbind( 
        round(runif(100, min=1, max=6)), 
        rep(1:20,each=5), runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6) , 
        runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6)  ))
names(panel.datasm) = c("choice", "id", "data_1991","data_1992",
  "data_1993", "data2_1991", "data2_1992","data2_1993") 


logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice", 
    varying= 3:5, shape = "wide", sep = "_")

Keep getting the error Error in Ops.factor(data[[choice]], alt) : level sets of factors are different

I have also tried assigning levels manually:

panel.datasm$id= factor(
    panel.datasm$id, 
    levels = sort(as.character(unique(panel.datasm$id)))  )

I have tried a number of things and can't figure out what is going wrong. For comparison take a look at :

data("Electricity", package = "mlogit")
head(Electricity)
Electr <- mlogit.data(Electricity, id = "id", choice = "choice", 
    varying = 3:26, shape = "wide", sep = "")

Which as far as I can tell is identical to my data format. What's going on here? I'm at my whit's end.

Upvotes: 3

Views: 3770

Answers (3)

EDennnis
EDennnis

Reputation: 331

The error comes from the reshape package. It is unable to determine the time element when converting the data.

The mlogit help guide ?mlogit.data provides the solution to this under the option "alt.levels" stating: "the name of the alternatives: if null, for a wide data.frame, they are guessed from the variable names and the choice variable (both should be the same)".

Since you are not giving the names of the alternatives reshape is guessing and cannot determine them. The fix then is to manually provide those names. Leaving the data as provided in the question you use the following:

logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice", 
                      varying= 3:8, shape = "wide", sep = "_",
                      alt.levels = c("data_1991","data_1992","data_1993", "data2_1991", "data2_1992", "data2_1993"))

*Note: As was mentioned by @James, you should vary from 3:8 NOT 3:5.

Upvotes: 0

Andrie
Andrie

Reputation: 179448

I believe I have traced the problem. Your choice variables and your alternative variables should be the same.

If you change your the first column of your data.frame to have values between 1991:1993 it will work.

panel.datasm = data.frame(
    cbind( 
        sample(1991:1993, 100, replace=TRUE), 
        rep(1:20,each=5), runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6) , 
        runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6)  ))
names(panel.datasm) = c("choice", "id", "data_1991","data_1992",
    "data_1993", "data2_1991", "data2_1992","data2_1993") 


logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice", 
    varying= 3:5, shape = "wide", sep = "_") 

The results:

head(logit.data)
       choice id  alt       data     data2 chid
1.1991  FALSE  1 1991 0.03540498 0.9726110    1
1.1992  FALSE  1 1992 5.85285278 2.7973798    1
1.1993   TRUE  1 1993 5.80795641 3.7360297    1
2.1991   TRUE  1 1991 0.59255235 0.2564928    2
2.1992  FALSE  1 1992 5.81443351 3.0820215    2
2.1993  FALSE  1 1993 2.11699854 5.4161634    2

If you now compare it with Electricity, the difference is obvious. Notice that the choices are 1:4, and each alternative ranges from 1 to 4.

head(Electricity)
  choice id pf1 pf2 pf3 pf4 cl1 cl2 cl3 cl4 loc1 loc2 loc3 loc4 wk1 wk2 wk3 wk4
1      4  1   7   9   0   0   5   1   0   5    0    1    0    0   1   0   0   1
2      3  1   7   9   0   0   0   5   1   5    0    0    1    0   1   1   0   0
3      4  1   9   7   0   0   5   1   0   0    0    0    0    1   0   1   1   0
4      4  1   0   9   7   0   1   1   0   5    0    0    1    0   1   0   0   1
5      1  1   0   9   0   7   0   1   0   5    1    0    0    0   0   1   0   1
6      4  1   0   9   0   7   0   0   1   5    0    0    1    0   0   0   0   1

Upvotes: 2

Ramnath
Ramnath

Reputation: 55695

The problem is that the row.names created by reshape are not unique and that is causing trouble. Here is a quick fix. You need to add a chid.var that would be unique for each row. I have used the index function from zoo to do that. You can use other ways as well I suppose.

mlogit.data(panel.datasm, choice = 'choice', id = 'id', shape = 'wide', 
 varying = 3:8, sep = "_", chid.var = 1:NROW(index))

        choice id  alt     data      data2
1.1991  FALSE  1 1991 0.4769187 0.97381645
1.1992  FALSE  1 1992 3.2998748 0.70989021
1.1993  FALSE  1 1993 5.6199917 5.53069555
2.1991  FALSE  1 1991 0.3615670 0.02066214
2.1992  FALSE  1 1992 2.0461820 0.41804600
2.1993  FALSE  1 1993 2.2764992 3.93337758

Upvotes: 0

Related Questions