Jovan
Jovan

Reputation: 815

More robust and faster polr for ordinal regression

Looking for a workaround, if there is a more robust and faster polr to fit for High Multi-dimensional data in an ordinal data context. (Similiar to those like lm() and .lm.fit())

Example datasets: https://filebin.net/e1qz05qy9qo6zpwa

library(tictoc)
library(MASS)
custom_data <- read.csv(file.choose())
tic()
polr(LH_info ~ ., data = custom_data[,1:100])
toc() #0.61 seconds

ADDED EDIT: Issues found using current polr & orm methods:

Specifically Using this dataset for orm issues: https://filebin.net/hnpbkrw4gc9a5pn9

custom_data2 <- read.csv(file.choose())
custom_data2$OC_info <- factor(OC_custom$OC_info, order = TRUE,
                            levels=c("Extreme Low Open Close (<-40)","Common Lower Open Close (-40-0)", 
                                     "Common Higher Open Close (0-40)","Extreme High Open Close (>40)"))
test_model <- orm(OC_info ~ ., data = custom_data2[,1:101])
test_model2 <- orm(OC_info ~ ., data = custom_data2[,1:102])

Specifically Using this dataset for polr issues: https://filebin.net/hg7irb8al8pfs9sd

custom_data3 <- read.csv(file.choose())
custom_data3$OC_info <- factor(OC_custom$OC_info, order = TRUE,
                            levels=c("Extreme Low Open Close (<-40)","Common Lower Open Close (-40-0)", 
                                     "Common Higher Open Close (0-40)","Extreme High Open Close (>40)"))
test_model3 <- polr(OC_info ~ ., data = custom_data3)
  1. polr: Error in optim(s0, fmin, gmin, method = "BFGS", ...) initial value in 'vmmin' is not finite --> Happens sometimes with some independent variables combination

  2. orm: Error in .local(x, ...) : Increase tmpmax --> this always happen when try to model the dataset with more or equal of 100 independent variables

Upvotes: 1

Views: 671

Answers (1)

Quinten
Quinten

Reputation: 41367

You could use the clm function from the ordinal package or the orm function of the rms package to fit an ordinal regression. In both you could use *.fit options. Since you want to check the speed, here is a benchmark:

library(microbenchmark)
library(MASS)
library(ordinal)
library(rms)

set.seed(7)
custom_data <- read.csv("dataset_example.csv")
custom_data$LH_info <- as.factor(custom_data$LH_info)
custom_data$LH_info <- as.factor(custom_data$LH_info)

m = microbenchmark(
  "polr" = {
    polr(LH_info ~ ., data = custom_data[,1:100])
  },
  "clm" = {
    clm(LH_info ~ ., data = custom_data[,1:100])
  }, 
  "orm" = {
    orm(LH_info ~ ., data = custom_data[,1:100])
  }, times = 100
)

m
#> Unit: milliseconds
#>  expr      min       lq     mean   median       uq      max neval cld
#>  polr 174.6823 183.0839 194.1672 188.6606 195.7334 327.6748   100 a  
#>   clm 340.8700 354.7288 365.2914 360.8585 366.6671 485.0190   100   c
#>   orm 251.0034 261.5099 276.0913 266.3175 273.9440 405.5983   100  b
library(ggplot2)
autoplot(m)

Created on 2023-02-03 with reprex v2.0.2

Your polr option is already pretty fast.


More information about both functions:

Upvotes: 1

Related Questions