Reputation: 5295
I am having some trouble with levels... Running the following:
library(mlogit)
panel.datasm = data.frame(
cbind(
round(runif(100, min=1, max=6)),
rep(1:20,each=5), runif(100, min=0, max=1),
runif(100, min=0, max=6),
runif(100, min=2, max=6) ,
runif(100, min=0, max=1),
runif(100, min=0, max=6),
runif(100, min=2, max=6) ))
names(panel.datasm) = c("choice", "id", "data_1991","data_1992",
"data_1993", "data2_1991", "data2_1992","data2_1993")
logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice",
varying= 3:5, shape = "wide", sep = "_")
Keep getting the error Error in Ops.factor(data[[choice]], alt) : level sets of factors are different
I have also tried assigning levels manually:
panel.datasm$id= factor(
panel.datasm$id,
levels = sort(as.character(unique(panel.datasm$id))) )
I have tried a number of things and can't figure out what is going wrong. For comparison take a look at :
data("Electricity", package = "mlogit")
head(Electricity)
Electr <- mlogit.data(Electricity, id = "id", choice = "choice",
varying = 3:26, shape = "wide", sep = "")
Which as far as I can tell is identical to my data format. What's going on here? I'm at my whit's end.
Upvotes: 3
Views: 3770
Reputation: 331
The error comes from the reshape package. It is unable to determine the time element when converting the data.
The mlogit help guide ?mlogit.data provides the solution to this under the option "alt.levels" stating: "the name of the alternatives: if null, for a wide data.frame, they are guessed from the variable names and the choice variable (both should be the same)".
Since you are not giving the names of the alternatives reshape is guessing and cannot determine them. The fix then is to manually provide those names. Leaving the data as provided in the question you use the following:
logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice",
varying= 3:8, shape = "wide", sep = "_",
alt.levels = c("data_1991","data_1992","data_1993", "data2_1991", "data2_1992", "data2_1993"))
*Note: As was mentioned by @James, you should vary from 3:8 NOT 3:5.
Upvotes: 0
Reputation: 179448
I believe I have traced the problem. Your choice
variables and your alternative
variables should be the same.
If you change your the first column of your data.frame
to have values between 1991:1993
it will work.
panel.datasm = data.frame(
cbind(
sample(1991:1993, 100, replace=TRUE),
rep(1:20,each=5), runif(100, min=0, max=1),
runif(100, min=0, max=6),
runif(100, min=2, max=6) ,
runif(100, min=0, max=1),
runif(100, min=0, max=6),
runif(100, min=2, max=6) ))
names(panel.datasm) = c("choice", "id", "data_1991","data_1992",
"data_1993", "data2_1991", "data2_1992","data2_1993")
logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice",
varying= 3:5, shape = "wide", sep = "_")
The results:
head(logit.data)
choice id alt data data2 chid
1.1991 FALSE 1 1991 0.03540498 0.9726110 1
1.1992 FALSE 1 1992 5.85285278 2.7973798 1
1.1993 TRUE 1 1993 5.80795641 3.7360297 1
2.1991 TRUE 1 1991 0.59255235 0.2564928 2
2.1992 FALSE 1 1992 5.81443351 3.0820215 2
2.1993 FALSE 1 1993 2.11699854 5.4161634 2
If you now compare it with Electricity
, the difference is obvious. Notice that the choices are 1:4
, and each alternative ranges from 1 to 4.
head(Electricity)
choice id pf1 pf2 pf3 pf4 cl1 cl2 cl3 cl4 loc1 loc2 loc3 loc4 wk1 wk2 wk3 wk4
1 4 1 7 9 0 0 5 1 0 5 0 1 0 0 1 0 0 1
2 3 1 7 9 0 0 0 5 1 5 0 0 1 0 1 1 0 0
3 4 1 9 7 0 0 5 1 0 0 0 0 0 1 0 1 1 0
4 4 1 0 9 7 0 1 1 0 5 0 0 1 0 1 0 0 1
5 1 1 0 9 0 7 0 1 0 5 1 0 0 0 0 1 0 1
6 4 1 0 9 0 7 0 0 1 5 0 0 1 0 0 0 0 1
Upvotes: 2
Reputation: 55695
The problem is that the row.names
created by reshape
are not unique and that is causing trouble. Here is a quick fix. You need to add a chid.var
that would be unique for each row. I have used the index
function from zoo
to do that. You can use other ways as well I suppose.
mlogit.data(panel.datasm, choice = 'choice', id = 'id', shape = 'wide',
varying = 3:8, sep = "_", chid.var = 1:NROW(index))
choice id alt data data2
1.1991 FALSE 1 1991 0.4769187 0.97381645
1.1992 FALSE 1 1992 3.2998748 0.70989021
1.1993 FALSE 1 1993 5.6199917 5.53069555
2.1991 FALSE 1 1991 0.3615670 0.02066214
2.1992 FALSE 1 1992 2.0461820 0.41804600
2.1993 FALSE 1 1993 2.2764992 3.93337758
Upvotes: 0