elpavlos
elpavlos

Reputation: 35

Unable to perform logistic regression in R

I am trying out logistic regression on a data.frame (11359 rows, 137 columns). The data.frame contains Y (one dependent variable) and the predictors (136 independent variables). All the variables are binary.

The formula I created based on "my_data" data.frame is f = as.formula(paste('y ~', paste(colnames(my_data)[c(3:52, 54:133, 138:143)], collapse = '+'))). I applied glm, logistf and pmlr as follows

Glm function estimates some parameters but gives a Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred. I figured out that this message was generated due to separation issue so I tried logistf and pmlr functions.

With logistf, I didn't get any results after 50 hours without error, so I decided to terminate te process. (cpu usage 23-27%, ram usage approx. 1100mb during the first 10 hours, then 2-3mb).

For pmlr, I got this Error: cannot allocate vector of size 28.9 Gb.

I tried logistf and pmlr based on 10 out of 137 variables to check if the problem is the number of the predictors and I got the same. Logistf was working "for ever" and pmlr gave same type of error with different size of vector (bigger than previous!!!!, if I recall correctly approx. 45 Gb).

Should I update my laptop's RAM to perform this calculation, find some other functions (if there are other packages for penalized logistic regression) or it's a different kind of problem e.g. lot of variables?

Windows 10 x64, Processor: i3-2.4GHz, Ram: 8.00Gb, R version: x64 3.4.0, Rstudio: 1.0.143.

Upvotes: 1

Views: 1336

Answers (1)

Ajay Ohri
Ajay Ohri

Reputation: 3492

https://cran.r-project.org/web/packages/biglm/biglm.pdf and https://www.rdocumentation.org/packages/biglm/versions/0.9-1/topics/biglm

biglm creates a linear model object that uses only p^2 memory for p variables. It can be updated with more data using update. This allows linear regression on data sets larger than memory.

bigglm creates a generalized linear model object that uses only p^2 memory for p variables.

bigglm Usage

bigglm(formula, data, family=gaussian(),...)
## S3 method for class
'
data.frame
'
bigglm(formula, data,...,chunksize=5000)
## S3 method for class
'
function
'
bigglm(formula, data, family=gaussian(),
weights=NULL, sandwich=FALSE, maxit=8, tolerance=1e-7,
start=NULL,quiet=FALSE,...)
## S3 method for class
'
RODBC
'
bigglm(formula, data, family=gaussian(),
tablename, ..., chunksize=5000)
## S4 method for signature
'
ANY,DBIConnection
'
bigglm(formula, data, family=gaussian(),
tablename, ..., chunksize=5000

Upvotes: 1

Related Questions