Reputation: 67
I have a dataset that looks like this:
Observation Outcome VariableA VariableB VariableC
1 1 1.27 0.2 0.81
2 0 0.30 0.45 0.70
3 -1 0.27 1.2 0.56
The Outcome variable can take on the values 1, 0, -1 and is supposed to be the dependent variable in a multinomial logit model which I will implement in R using the mlogit package. I have transformed my data using the following code:
mlogitdataset <- mlogit.data(dataset, choice = "Outcome", shape="wide")
which gives me the following new dataset:
Observation Outcome VariableA VariableB VariableC alt
1 FALSE 1.27 0.2 0.81 -1
1 FALSE 1.27 0.2 0.81 0
1 TRUE 1.27 0.2 0.81 1
2 FALSE 0.20 0.45 0.70 -1
2 TRUE 0.20 0.45 0.70 0
2 FALSE 0.20 0.45 0.70 1
This is essentially how I want the data to be structured, however, I do not want to use VariableA-C as separate independent variables in the multinomial logit regression. Instead, I want the independent variable to take on a value either from Variable A, B or C depending on the value of alt. This can be represented by VariableD in the table below:
Observation Outcome VariableA VariableB VariableC alt VariableD
1 FALSE 1.27 0.20 0.81 -1 0.81
1 FALSE 1.27 0.20 0.81 0 0.20
1 TRUE 1.27 0.20 0.81 1 1.27
2 FALSE 0.20 0.45 0.70 -1 0.70
2 TRUE 0.20 0.45 0.70 0 0.45
2 FALSE 0.20 0.45 0.70 1 0.20
This would allow me to run the multinomial logit regression:
mlog <- mlogit(Outcome ~ 1 | VariableD, data=mlogitdataset, reflevel = "0")
I have tried to create VariableD directly within the mlogit object (mlogitdataset) using the following code:
outcome_map <- data.frame(alt = c(1, 0, -1), var = grep('Variable[A-C]', names(mlogitdataset)))
mlogitdataset$VariableD <- mlogitdataset[cbind(seq_len(nrow(mlogitdataset)), with(outcome_map, var[match(mlogitdataset$alt, alt)]))]
However, that gives me the error message "row names supplied are of the wrong length" when trying to run the multinomial logit regression.
How should I transform/format/structure the data so that I can run the intended regression using the mlogit function?
Thanks!
Upvotes: 2
Views: 983
Reputation: 501
You can use case_when()
from dplyr
together with mutate()
:
library(dplyr)
mlogitdataset <- read.csv(text = "Observation,Outcome,VariableA,VariableB,VariableC,alt
1,FALSE,1.27,0.20,0.81,-1
1,FALSE,1.27,0.20,0.81,0
1,TRUE,1.27,0.20,0.81,1
2,FALSE,0.20,0.45,0.70,-1
2,TRUE,0.20,0.45,0.70,0
2,FALSE,0.20,0.45,0.70,1")
mlogitdataset <- mutate(mlogitdataset,
VariableD = case_when(
alt == -1 ~ VariableC,
alt == 0 ~ VariableB,
alt == 1 ~ VariableA
))
Upvotes: 1