Reputation: 815
Looking for a workaround, if there is a more robust and faster polr
to fit for High Multi-dimensional data in an ordinal data context. (Similiar to those like lm()
and .lm.fit()
)
Example datasets: https://filebin.net/e1qz05qy9qo6zpwa
library(tictoc)
library(MASS)
custom_data <- read.csv(file.choose())
tic()
polr(LH_info ~ ., data = custom_data[,1:100])
toc() #0.61 seconds
ADDED EDIT: Issues found using current polr
& orm
methods:
Specifically Using this dataset for orm issues: https://filebin.net/hnpbkrw4gc9a5pn9
custom_data2 <- read.csv(file.choose())
custom_data2$OC_info <- factor(OC_custom$OC_info, order = TRUE,
levels=c("Extreme Low Open Close (<-40)","Common Lower Open Close (-40-0)",
"Common Higher Open Close (0-40)","Extreme High Open Close (>40)"))
test_model <- orm(OC_info ~ ., data = custom_data2[,1:101])
test_model2 <- orm(OC_info ~ ., data = custom_data2[,1:102])
Specifically Using this dataset for polr issues: https://filebin.net/hg7irb8al8pfs9sd
custom_data3 <- read.csv(file.choose())
custom_data3$OC_info <- factor(OC_custom$OC_info, order = TRUE,
levels=c("Extreme Low Open Close (<-40)","Common Lower Open Close (-40-0)",
"Common Higher Open Close (0-40)","Extreme High Open Close (>40)"))
test_model3 <- polr(OC_info ~ ., data = custom_data3)
polr: Error in optim(s0, fmin, gmin, method = "BFGS", ...) initial value in 'vmmin' is not finite
--> Happens sometimes with some independent variables combination
orm: Error in .local(x, ...) : Increase tmpmax
--> this always happen when try to model the dataset with more or equal of 100 independent variables
Upvotes: 1
Views: 671
Reputation: 41367
You could use the clm
function from the ordinal
package or the orm
function of the rms
package to fit an ordinal regression. In both you could use *.fit
options. Since you want to check the speed, here is a benchmark:
library(microbenchmark)
library(MASS)
library(ordinal)
library(rms)
set.seed(7)
custom_data <- read.csv("dataset_example.csv")
custom_data$LH_info <- as.factor(custom_data$LH_info)
custom_data$LH_info <- as.factor(custom_data$LH_info)
m = microbenchmark(
"polr" = {
polr(LH_info ~ ., data = custom_data[,1:100])
},
"clm" = {
clm(LH_info ~ ., data = custom_data[,1:100])
},
"orm" = {
orm(LH_info ~ ., data = custom_data[,1:100])
}, times = 100
)
m
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> polr 174.6823 183.0839 194.1672 188.6606 195.7334 327.6748 100 a
#> clm 340.8700 354.7288 365.2914 360.8585 366.6671 485.0190 100 c
#> orm 251.0034 261.5099 276.0913 266.3175 273.9440 405.5983 100 b
library(ggplot2)
autoplot(m)
Created on 2023-02-03 with reprex v2.0.2
Your polr
option is already pretty fast.
More information about both functions:
ordinal
package: Cumulative Link Models for Ordinal Regression with the R package ordinalorm
function (Ordinal Regression Model)Upvotes: 1