carsentdum
carsentdum

Reputation: 67

mlogit data transformation, R

I have a dataset that looks like this:

Observation  Outcome  VariableA  VariableB   VariableC
     1          1         1.27       0.2         0.81        
     2          0         0.30       0.45        0.70           
     3         -1         0.27       1.2         0.56 

The Outcome variable can take on the values 1, 0, -1 and is supposed to be the dependent variable in a multinomial logit model which I will implement in R using the mlogit package. I have transformed my data using the following code:

mlogitdataset <- mlogit.data(dataset, choice = "Outcome", shape="wide")

which gives me the following new dataset:

Observation  Outcome VariableA  VariableB  VariableC   alt
     1        FALSE       1.27       0.2        0.81   -1     
     1        FALSE       1.27       0.2        0.81    0      
     1         TRUE       1.27       0.2        0.81    1
     2        FALSE       0.20       0.45       0.70   -1
     2         TRUE       0.20       0.45       0.70    0   
     2        FALSE       0.20       0.45       0.70    1

This is essentially how I want the data to be structured, however, I do not want to use VariableA-C as separate independent variables in the multinomial logit regression. Instead, I want the independent variable to take on a value either from Variable A, B or C depending on the value of alt. This can be represented by VariableD in the table below:

 Observation  Outcome VariableA  VariableB  VariableC   alt  VariableD
     1        FALSE       1.27       0.20       0.81   -1       0.81
     1        FALSE       1.27       0.20       0.81    0       0.20
     1         TRUE       1.27       0.20       0.81    1       1.27
     2        FALSE       0.20       0.45       0.70   -1       0.70
     2         TRUE       0.20       0.45       0.70    0       0.45
     2        FALSE       0.20       0.45       0.70    1       0.20

This would allow me to run the multinomial logit regression:

mlog <- mlogit(Outcome ~ 1 | VariableD, data=mlogitdataset, reflevel = "0") 

I have tried to create VariableD directly within the mlogit object (mlogitdataset) using the following code:

outcome_map <- data.frame(alt = c(1, 0, -1), var = grep('Variable[A-C]', names(mlogitdataset)))

mlogitdataset$VariableD <- mlogitdataset[cbind(seq_len(nrow(mlogitdataset)), with(outcome_map, var[match(mlogitdataset$alt, alt)]))]

However, that gives me the error message "row names supplied are of the wrong length" when trying to run the multinomial logit regression.

How should I transform/format/structure the data so that I can run the intended regression using the mlogit function?

Thanks!

Upvotes: 2

Views: 983

Answers (1)

fujiu
fujiu

Reputation: 501

You can use case_when() from dplyr together with mutate():

library(dplyr)

mlogitdataset <- read.csv(text = "Observation,Outcome,VariableA,VariableB,VariableC,alt
1,FALSE,1.27,0.20,0.81,-1
1,FALSE,1.27,0.20,0.81,0
1,TRUE,1.27,0.20,0.81,1
2,FALSE,0.20,0.45,0.70,-1
2,TRUE,0.20,0.45,0.70,0
2,FALSE,0.20,0.45,0.70,1")

mlogitdataset <- mutate(mlogitdataset, 
       VariableD = case_when(
         alt == -1 ~ VariableC,
         alt ==  0 ~ VariableB,
         alt ==  1 ~ VariableA
       ))

Upvotes: 1

Related Questions