Jesse001
Jesse001

Reputation: 924

weird list size in R, explanation?

I'm at a total loss on this one. I have a large, though not unreasonable, matrix for my data frame in R (48000 * 19). I'm trying to use sm.ancova() to investigate the differential effect slopes, but got

error: cannot allocate vector of size 13.1GB

13GB overtaxed the memory allocated to R, I get that. But... what?! The entire CSV file I read in was only 24,000kb. Why are these single vectors so huge in R?

The ancova code I'm using is:

data1<-read.csv("data.csv")
attach(data1)
sm.ancova(s,dt,dip,model="none") 

Looking in to it a bit, I used:

diag(s)
length(s)
diag(dt)
length(dt)
diag(dip)    
length(dip)

Which all gave the same error. Their lengths are all 48000.

Any explanation would help. A fix would be better :)

Thanks in advance!

A dummy data link that reproduces this problem can be found at: https://www.dropbox.com/s/dxxofb3o620yaw3/stackexample.csv?dl=0

Upvotes: 1

Views: 352

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226971

Get data:

## CSV file is 10M on disk, so it's worth using a faster method
##   than read.csv() to import ...
data1 <- data.table::fread("stackexample.csv",data.table=FALSE)
dd <- data1[,c("s","dt","dip")]

If you give diag() a vector, it's going to try to make a diagonal matrix with that vector on the diagonal. The example data set you gave us is 96,000 rows long, so diag() applied to any element will try to construct a 96,000 x 96,000 matrix. A 1000x1000 matrix is

format(object.size(diag(1000)),"Mb")  ## 7.6 Mb

so the matrix you're trying to construct here will be 96^2*7.6/1024 = 68 Gb.

A 24Kx24K matrix would be 16 times smaller but still about 4 Gb ...

It is possible to use sparse matrices to construct big diagonal matrices:

library(Matrix)
object.size(Diagonal(x=1:96000))
## 769168 bytes

More generally, not all analysis programs are written with computational efficiency (either speed or memory) in mind. The papers on which this method is based (?sm.ancova) were written in the late 1990s, when 24,000 observations would have constituted a huge data set ...

Upvotes: 4

Related Questions