Kevin
Kevin

Reputation: 143

How do inputs to UBL::SmoteClassif() influence vectors lengths passed to Fortran?

I'm using UBL::SmoteClassif() function in R to over-sample minority classes to create a more balanced dataset. I have 8 classes. I had a dataset with 357,038 rows and 147 columns/covariates and it works. I have another dataset with 186,274 rows and 186 columns and it produces the following error:

"Error in neighbours(tgt, dat, dist, p, k) : long vectors (argument 10) are not supported in .Fortran"

Is there a formula I could use where I input the number of columns in the dataset and other parameter settings of the function and it would provide the maximum number of rows the dataset can have for the function to work? This would help me scale my analysis.

Here is a reproducible example that is similar to what I was doing --

library(UBL)
library(tidyverse)
test<-data.frame(replicate(186,sample(0:10000,186274,rep=TRUE)),class=c(rep("Class_1",15735),
                                                                        rep("Class_2",3767),
                                                                        rep("Class_3",9874),
                                                                        rep("Class_4",30670),
                                                                        rep("Class_5",1540),
                                                                        rep("Class_6",25109),
                                                                        rep("Class_7",84307),
                                                                        rep("Class_8",15272)))
test<-test%>%mutate(class=factor(class, levels=c('Class_1','Class_2','Class_3','Class_4','Class_5','Class_6','Class_7','Class_8')))
l = list(Class_1 = 1.15, Class_2 = 4.19, Class_3 = 1.55, Class_4=1.00, 
         Class_5=9.81,Class_6=1.04,Class_7=1.01,Class_8=1.00)
datBal <- SmoteClassif(class ~ ., test, C.perc = l)#error

test<-data.frame(replicate(186,sample(0:10000,357038,rep=TRUE)),class=c(rep("Class_1",31878),
                                                                        rep("Class_2",6406),
                                                                        rep("Class_3",31351),
                                                                        rep("Class_4",55430),
                                                                        rep("Class_5",1598),
                                                                        rep("Class_6",32293),
                                                                        rep("Class_7",176013),
                                                                        rep("Class_8",22069)))
test<-test%>%mutate(class=factor(class, levels=c('Class_1','Class_2','Class_3','Class_4','Class_5','Class_6','Class_7','Class_8')))
l = list(Class_1 = 1.14, Class_2 = 4.84, Class_3 = 1.00, Class_4=1.00, 
         Class_5=18.2,Class_6=1.57,Class_7=1.00,Class_8=1.33)
datBal <- SmoteClassif(class ~ ., test, C.perc = l)#this works

Link to SmoteClassif source code

Link to neighbors source code

Link to foreign function interface

Upvotes: 0

Views: 144

Answers (0)

Related Questions