Reputation: 165
Unlike a previous question about this, this case is different to that and that is why I'm asking. I have an already cleaned dataset containing 120 000 observations of 25 variables, and I am supposed to analyze it all through logistic regression and random forest. However, I get an error "cannot allocate vector of size 98 GB whereas my friend doesn't.
Summary says most of it. I even tried to reduce number of observations to 50 000 and number of variables in dataset to 15 (used 5 of them in regression) and it failed. However, I tried sending the script where i shortened the dataset to a friend, and she could run it. This is odd because I have a 64 bit system and 8 GB RAM, she has only 4 GB. So it appears that the problem lies with me.
pd_data <- read.csv2("pd_data_v2.csv")
split <- rsample::initial_split(pd_data, prop = 0.7)
train <- rsample::training(split)
test <- rsample::testing(split)
log_model <- glm(default ~ profit_margin + EBITDA_margin + payment_reminders, data = pd_data, family = "binomial")
log_model
The result should be a logistic model where I can see coefficients and meassure it's accuracy, and make adjustments.
Upvotes: 0
Views: 230