Split dataset per rows into smaller files in R

Question

I am analyzing a dataset, with 1.14 GB (1,232,705,653 bytes).

When reading the data in R:

trade = read.csv("commodity_trade_statistics_data.csv")

One can see that it has 8225871 instances and 10 attributes.

As I intend to analyze the dataset through a Data Wrangling web app that has a limit on the imports of 100MB, I am wondering how can I split the data into files with a max of 100MB?

The split that I intend to do is per rows and each file should contain the header.

eastclintw00d · Accepted Answer

Split up the dataframe into the desired number of chunks. Here is an example with the built-in mtcars dataset:

no_of_chunks <- 5

f <- ceiling(1:nrow(mtcars) / nrow(mtcars) * 5)

res <- split(mtcars, f)

You can then save the result back as csv using purrr:

library(purrr)
map2(res, paste0("chunk_", names(res), ".csv"), write.csv)

Edit: In the context of my question, the following script solved the problem:

trade = read.csv("commodity_trade_statistics_data.csv")

no_of_chunks <- 14

f <- ceiling(1:nrow(trade) / nrow(trade) * 14)

res <- split(trade, f)

library(purrr)
map2(res, paste0("chunk_", names(res), ".csv"), write.csv)

Split dataset per rows into smaller files in R

Answers (2)

Related Questions