Read a large data set via fread in R but only need a subset (one variable that equals some values)

Question

I am trying to read a large dataset (>30G) in R but my laptop only has 16G of RAM. But all I need is only a subset of this dataset. Specifically I need all the observations whose ID (there is one variable in my dataset that represents this ID) equals to some values (these values come from another dataset). If I have enough RAM, it will be natural to read the two data files first and then merge by the common ID.

With the lack in RAM, is it possible to pre-process the data file somehow using a shell command so that I can use it as an argument for cmd of fread. Or does anyone have an alternative solution? Thanks in advance!

Read a large data set via fread in R but only need a subset (one variable that equals some values)

Answers (1)

Related Questions