Robbert
Robbert

Reputation: 51

How to prevent memory problems in an R-loop using large files?

I am trying to run a for-loop in R (2.15.0) using large files/matrices (Windows Vista 64-bit, 4G RAM). If I do a single iteration of this loop, it works fine. But if I run the loop over multiple files (and I would like to do hundreds of these), I run out of memory. I tried to remove the files at the end of the loop, but it doesn't return the memory to Windows (as I can see in the Windows Task Manager). It seems that R tries to get more and more memory from my OS, instead of using the memory internally available after removing the files. Is there any workaround for this? If you would need more details about the research question, I will be happy to share the rest to find a proper solution.

Thanks already! Cheers, Robbert

> library(VariantAnnotation)
> fi<-list.files("E:/1000genomes/chr22",full.names=T)
> for(i in 1:length(fi)) {
+   input=paste("smaller.00", i, ".gz", sep = "")
+   output=paste("geno.", i, ".RData", sep = "")
+   vcf = readVcf(input, "hg19")
+   genotypes=geno(vcf)$GT[,]
+   save(genotypes, file=output)
+   gc()
+   }
Error: scanVcf: Realloc could not re-allocate memory (873600000 bytes)
  path: E:\1000genomes\chr22\smaller.002.gz
In addition: Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
  Reached total allocation of 3963Mb: see help(memory.size)
2: In doTryCatch(return(expr), name, parentenv, handler) :
  Reached total allocation of 3963Mb: see help(memory.size)
> gc()
           used  (Mb) gc trigger  (Mb)  max used  (Mb)
Ncells  4543758 242.7   12363911 660.4  18010556 961.9
Vcells 19536404 149.1   61090604 466.1 119317584 910.4

and if I remove stuff at the end of my script:

+   save(genotypes, file=output)
+   rm(vcf)
+   rm(genotypes)
+   rm(input)
+   rm(output)
+   rm(getal)
+   rm(i)
+   }
Error: scanVcf: Calloc could not allocate memory (18 of 1 bytes)
  path: E:\1000genomes\chr22\smaller.001.gz
In addition: Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  Reached total allocation of 3963Mb: see help(memory.size)
> gc()
          used  (Mb) gc trigger  (Mb)  max used  (Mb)
Ncells 2355472 125.8   10798339 576.7  16872405 901.1
Vcells 1992717  15.3   62280756 475.2 105556441 805.4

And I found on the internet that running from command prompt may work, so I put the script in the file "runthis.R" in the R-root directory and ran: Rscript.exe runthis.R --no-save --no-restore It ran one extra file, and then reported the same error.

Upvotes: 4

Views: 3536

Answers (1)

Robbert
Robbert

Reputation: 51

Somehow, I made a mistake in the analyses. Now I found out (after a lot of hassle of course) that indeed the rm() followed by gc() command does work in my case! Thanks mneI for pointing this out :)

Upvotes: 1

Related Questions