Luis
Luis

Reputation: 1584

Multinomial regression (different results -- same dataset, R vs SPSS). nnet package -- multinom function

Recently, I had to work with R and SPSS to analyze a dataset with Multinomial Regression framework. We surveyed some participants (10-12 years old) and we asked which "professional field" they like the most, then we asked how often they accessed the internet. So, the outcome is a Categorical variable": professional field -- "military", "I don't know", and "Other profession"; and the independent variable is also a categorical variable (how often do you access the internet ("I don't have access", "1-3 hours/day", "3-5 hours/day").

I ran a model using R (with nnet package, via multinom function), and other statistician ran using SPSS. All reference category was defined correctly.

Now, when we compare the results, they don't agree for the second category of my independent variable. The first one is ok.

Please have a look at the entire code:

library(tidyverse)
library(stargazer)
library(nnet)

ds <- ds %>% mutate(internet = factor(internet))
ds <- ds %>% mutate(internet = relevel(internet, ref = "I dont have internet access"))

ds <- ds %>% mutate(field = factor(field))
ds <- ds %>% mutate(fielf = relevel(field, ref = "I dont know"))

mod <- multinom(field ~ internet, data = ds, maxit=1000, reltol = 1.0e-9)
stargazer(mod, type = 'text')

and the SPSS result SPSS results

For the sake of clarity, when the independent variable has only two categories (such as sex, male and female), both R and SPSS agree with its results

SPSS results 2

After a huge effort trying to understand the discrepancy between both results, I read that nnet estimation could have some problems (optimization problem ?), and that the discrepancy of results is not so strange as I was thinking at the beginning..

Can someone explain to me what's going on here? What am I missing?! I assume SPSS and R must have the same results if we are running the same model.

Thank you

That's the ds I'm using in this example:

ds <- structure(list(sex = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                             2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 
                                             2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                             2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
                                             1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
                                             2L, 1L), .Label = c("male", "female"), class = "factor"), internet = structure(c(3L, 
                                                                                                                              3L, 2L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 2L, 
                                                                                                                              2L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 1L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 
                                                                                                                              2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 2L, 2L, 
                                                                                                                              2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 3L, 1L, 2L, 2L, 2L, 3L, 3L, 2L, 
                                                                                                                              2L, 1L, 3L, 2L, 2L, 3L, 2L, 2L), .Label = c("I dont have internet access", 
                                                                                                                                                                          "1-3 hours/day", "3-5 hours/day"), class = "factor"), field = structure(c(1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("I dont know", "Military", 
                                                                                                                                                                                                                                                                                                "Other profession"), class = "factor")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                             -73L))

Upvotes: 3

Views: 548

Answers (1)

jay.sf
jay.sf

Reputation: 73612

You could use mlogit alternatively which resembles the SPSS results more closely. The SPSS values should be valid, since Stata yields similar results (-14.88 (982.95), 11.58 (982.95), 11.44 (982.95)). The remaining deviations might stem from the ridiculous significance of "other profession".

library(mlogit)
ml.dat <- mlogit.data(ds, choice="field", shape="wide")
ml <- mlogit(field ~ 1 | internet, data=ml.dat)

Yielding

texreg::screenreg(ml)
=========================================================
                                                Model 1  
---------------------------------------------------------
Military:(intercept)                               -0.41 
                                                   (0.91)
Other profession:(intercept)                      -16.89 
                                                (2690.89)
Military:factor(internet)1-3 hours/day             -1.50 
                                                   (1.06)
Other profession:factor(internet)1-3 hours/day     13.60 
                                                (2690.89)
Military:factor(internet)3-5 hours/day             -1.64 
                                                   (1.06)
Other profession:factor(internet)3-5 hours/day     13.46 
                                                (2690.89)
---------------------------------------------------------
AIC                                                85.49 
Log Likelihood                                    -36.74 
Num. obs.                                          73    
=========================================================
*** p < 0.001, ** p < 0.01, * p < 0.05

Upvotes: 0

Related Questions