Reputation: 85
I have a matrix of approximately ~240 mio rows and 3 columns that I need to "import and work with" in R. I do not have access to a server right now, so I got the idea of importing a submatrix, working with it, and then discarding it from the environment and repeating the procedure until the whole matrix is done (for what I have to do it works just as well). In particular, as the number of rows is multiple of 11, I decided to work with 11 submatrices. Therefore, what I have been doing is the following:
After finishing importing the 6th block, I realized I was leaving header=T so I set header=F. Since then, every time I tried importing the file the R session aborted. *EDIT: setting back header=T is not working either.
I thought it depended on the header=F thing, but it was not the case. Therefore I guess it has to do with Nstep, or with the first row of the submatrix. I tried doing some experiments: - if I re-import the first block, it works - if I import the 5th block, first ten rows only, it takes ages (I let it start about 20 mins ago and is not finished yet, even though it's just 10 rows) - if I repeat it on R instead as on R Studio, I have the same issues.
Any idea about why this is happening? I am working with R version 3.1.1 on R Studio Version 0.98.1028, Platform: x86_64-w64-mingw32/x64 (64-bit).
Upvotes: 0
Views: 906
Reputation: 53
To not run out of memory, you can try to remove the matrix after working with it by rm(mat.n)
and then make sure to free your memory with gc()
.
Upvotes: 0
Reputation:
There are better alternatives to read.* functions for big data files. Specifically data.table package's fread()
function or the readr package which has slightly safer alternatives to fread (albeit a bit slower than fread but still very fast compared to original read.* functions).
At the end of the day you will still be limited by the size of your computer's memory. There are work arounds that too, but I think for your case fread() or readr will do just fine.
Upvotes: 1