Reputation: 924
I'm at a total loss on this one. I have a large, though not unreasonable, matrix for my data frame in R (48000 * 19). I'm trying to use sm.ancova() to investigate the differential effect slopes, but got
error: cannot allocate vector of size 13.1GB
13GB overtaxed the memory allocated to R, I get that. But... what?! The entire CSV file I read in was only 24,000kb. Why are these single vectors so huge in R?
The ancova code I'm using is:
data1<-read.csv("data.csv")
attach(data1)
sm.ancova(s,dt,dip,model="none")
Looking in to it a bit, I used:
diag(s)
length(s)
diag(dt)
length(dt)
diag(dip)
length(dip)
Which all gave the same error. Their lengths are all 48000.
Any explanation would help. A fix would be better :)
Thanks in advance!
A dummy data link that reproduces this problem can be found at: https://www.dropbox.com/s/dxxofb3o620yaw3/stackexample.csv?dl=0
Upvotes: 1
Views: 352
Reputation: 226971
Get data:
## CSV file is 10M on disk, so it's worth using a faster method
## than read.csv() to import ...
data1 <- data.table::fread("stackexample.csv",data.table=FALSE)
dd <- data1[,c("s","dt","dip")]
If you give diag()
a vector, it's going to try to make a diagonal matrix with that vector on the diagonal. The example data set you gave us is 96,000 rows long, so diag()
applied to any element will try to construct a 96,000 x 96,000 matrix. A 1000x1000 matrix is
format(object.size(diag(1000)),"Mb") ## 7.6 Mb
so the matrix you're trying to construct here will be 96^2*7.6/1024 = 68 Gb.
A 24Kx24K matrix would be 16 times smaller but still about 4 Gb ...
It is possible to use sparse matrices to construct big diagonal matrices:
library(Matrix)
object.size(Diagonal(x=1:96000))
## 769168 bytes
More generally, not all analysis programs are written with computational efficiency (either speed or memory) in mind. The papers on which this method is based (?sm.ancova
) were written in the late 1990s, when 24,000 observations would have constituted a huge data set ...
Upvotes: 4