Reputation: 1935
I am running into a big problem with importing data into R. The thing is the original dataset is over 5GB, which in no way I can read in my laptop with 4GB RAM in total. There are unknown number of rows in the dataset (at least thousands of rows). I was wondering if I could select say the first 2000 rows to load into R so the I can still fit the data into my working memory?
Upvotes: 0
Views: 839
Reputation: 121077
As Scott mentioned, you can limit the number of rows read from a text file with the nrows
to read.table
(and its variants like read.csv
).
You can use this in conjunction with the skip
argument to read later chunks in the dataset.
my_file <- "my file.csv"
chunk <- 2000
first <- read.csv(my_file, nrows = chunk)
second <- read.csv(my_file, nrows = chunk, skip = chunk)
third <- read.csv(my_file, nrows = chunk, skip = 2 * chunk)
You may also want to read the "Large memory and out-of-memory data" section of the high-performance computing task view.
Upvotes: 4