How to read a very large text file (~15GB)?

Question

I have a very large .txt file which I need to load into RStudio. The file contains ~850,000 Columns and ~2000 rows. Too large to open in Excel to convert to .csv. A description of the txt file contents:

SampleID,CpGprobeID1,CpGprobeID2(x850K)...
ID1,betavalue1(x850k)
ID2,betavalue1(x850K)...

Of this .txt file, I actually only need ~30000 of the columns (CpGprobeIDs). I have a .csv file containing those 30000 required CpGprobeIDs, listed in rows rather than columns (call this file CpGprobeIDsREDUCED.csv)

Is there a way that I can reduce the columns in the 15GB .txt file to only containing the ID's from the ~30000 rows CpGprobeIDsREDUCED.csv, or another way to load such a large file into RStudio?

So far I have unsucessfully tried:

library(data.table)
filename <- fread("filename.txt")

^ R encounters a fatal error and the session aborts. This actually does work for a smaller sample size of ~24 rows, even with the 850K columns, so this is the closest I got.

So then I tried converting it to a csv file containing only the ~30000 required IDs and encountered the following error message:

library(readr)
CpGprobeIDsREDUCED <- read_csv("CpGprobeIDsREDUCED.csv")
library(sqldf)
CpGkeep <- CpGprobeIDsREDUCED$Name
import_filename <- fn$read.csv.sql("filename.txt",  sep="	", sql = "select * from file where 
SampleID in ( `toString(CpGkeep)` )")

Error: too many columns on file
In addition: Warning message:
In for (i in seq_along(col)) col[i] <- length(scan(file, what = "", :
closing unused connection 3 (filename.txt)

I'm at a bit of a loss as to what to do next, any advice appreciated.

Thanks

How to read a very large text file (~15GB)?

Answers (1)

Related Questions