crazyaboutliv
crazyaboutliv

Reputation: 3199

Regression in R -- 4 features, 4 million instances

I have a text file in the form ( User Id, Movie Id, Ratings, Time) and I want to do a vanilla regression on the dataset .( Just 4 features, >4 million instances)

model <- glm ( UserId ~ MovieId+Ratings+Time,data=<name>) 

It gave an error :

ERROR: cannot allocate 138.5MB vector . 

The size of the file is just 93MB. How do I do regression with R and not have memory problems ? Should I store the data differently ?

Thanks .

Some more info : Working on a linux box with 3GB of RAM. I have googled around but most links I have got talk about datasets which are generally > RAM, which in my case is not true :( ( just 93MB) .

Upvotes: 7

Views: 2543

Answers (3)

bright-star
bright-star

Reputation: 6447

That R error message doesn't refer to the total amount of memory, but to the last chunk R tried to allocate and failed on. You may want to try profiling the memory usage (Monitor memory usage in R) to see what's really going on.

Upvotes: 2

Tommy
Tommy

Reputation: 40861

The model matrix required has the same number of rows as your data, but the number of columns in it is roughly the number of unique strings (factor levels)!

So if you have 1000 movies that will generate roughly a 4e6x1000 matrix of doubles. That's around 32 GB...

You can try to generate the model matrix separately like this:

# Sample of 100 rows, 10 users, 20 movies
d <- data.frame(UserId = rep(paste('U', 10), each=10),
                MovieId=sample(paste('M', 1:20), 100, replace=T),
                Ratings=runif(100), Time=runif(100, 45, 180))
dim(d) # 100 x 4
m <- model.matrix(~ MovieId+Ratings+Time, data=d)
dim(m) # 100 x 21

Upvotes: 4

NPE
NPE

Reputation: 500713

biglm is a package specifically designed for fitting regression models to large data sets.

It works by processing the data block-by-block. The amount of memory it requires is a function of the number of variables, but is not a function of the number of observations.

Upvotes: 9

Related Questions