khajlk
khajlk

Reputation: 861

Split up data in CSV file and writing to a file in slices using R

I have a data in CSV file containing 956,678 rows. The following piece of code reads the file and splits the data in groups (each group having 65,000 rows and remainder rows go to last group) in R.

my_file <- read.csv("~myfile_path/file.csv")
grps <- (split(my_file, (seq(nrow(my_file))-1) %/% 65000))
for (i in grps)
{
write.csv(grps, paste("path/output_file", i, ".csv", sep=""))
}

Now, I would like to write these groups as CSV files to the disk. Can anyone suggest me how to do that?

EDIT1:

Based on the comments, I have modified the code and getting the following error:

Error in data.frame(0 = list(nih_addr_id = c(664L, 665L, 666L, 667L, : arguments imply differing number of rows: 65000, 46677

Upvotes: 4

Views: 4208

Answers (2)

ToWii
ToWii

Reputation: 660

Here is a solution with lapply and data.table, which is fast - even for large datasets. The file is chunked by splitting the vector my_file_rows in chunks of 65k by rownumbers as set by chunk_size. The remainder is automatically taken care of by using the split function. You can easily adjust the number of rows by adjusting the number in chunk_size to your preferences. This solution pastes the beginning rownumber of each chunk into the file name by pasting x[1].

  my_file_rows <- seq(1, nrow(my_file))
  chunk_size <- 65e3

  lapply(split(my_file_rows, ceiling(my_file_rows/chunk_size)), function(x){

    fwrite(my_file[x,], paste0("path/output_file", x[1], ".csv"))

  })}

Upvotes: 2

Scott Warchal
Scott Warchal

Reputation: 1027

Your write.csv in the loop is trying to write the list as a .csv file, rather than the dataframe element of the list.

Try:

my_file <- read.csv("~myfile_path/file.csv")
grps <- (split(my_file, (seq(nrow(my_file))-1) %/% 65000))
for (i in seq_along(grps)) {
    write.csv(grps[[i]], paste0("path/output_file", i, ".csv"))
}

Upvotes: 3

Related Questions