Reputation: 56
I am trying to import a multiVCF file (created with GATK,~80 individuals (4.3Gb)) into R using the package "popgenome", and "readData". Unfortunately, the import always aborts with the error message: "R encountered a fatal error, session terminated". With a smaller data set it works fine.
I also tried with compressed vcfs (bgzip) – did also not work for me. Do I miss something? Does my PC have not have enough computing resources?
Has anyone had similar experiences or know how to solve this problem? About any advice I would be very grateful.
Kind regards
Pavlo
My code:
gff3_out = c()
my_filter = c()
for(chr in chromosomes){
my_filter <- list(seqid=chr)
gff3_out <- file.path(gff_path, paste(chr,".gff",sep=""))
export(readGFF("/path/to/my/gff.gff",filter=my_filter), gff3_out)
}
PopGenome::VCF_split_into_scaffolds("my_multiVCF_from_GATK.vcf","scaffoldVCFs2")
allgenomes <- PopGenome::readData("path/to/data/with_VCFs",format="VCF",gffpath = "path/to/data/gff_data",big.data = TRUE)
My PC:
Windows 10
Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz 1.99 GHz;
RAM 32,0 GB (31,9 GB verwendbar);
Systemtyp 64-Bit-Betriebssystem, x64-basierter Prozessor
Upvotes: 1
Views: 266
Reputation: 56
I have now tried on another Windows machine/fresh R installation, unfortunately with the same error.
I have also checked under Windows memory usage while the record was being read by readData
and gave no indication that a lack of RAM may be the cause of the crash.
After that on Linux server I read the data with the PopGenome
and readData
and it runs wonderfully.
My temporary solution is - Linux server. However, I still don't know if it's the Windows version of R that the readData
can't handle larger data sets, or Windows in general. Maybe someone can answer this question. Actually, it's a shame that you can't use such an attractive R package/function with larger datasets on Windows (at least in my case).
Upvotes: 1