Reputation: 861
I have a data in CSV file containing 956,678 rows. The following piece of code reads the file and splits the data in groups (each group having 65,000 rows and remainder rows go to last group) in R.
my_file <- read.csv("~myfile_path/file.csv")
grps <- (split(my_file, (seq(nrow(my_file))-1) %/% 65000))
for (i in grps)
{
write.csv(grps, paste("path/output_file", i, ".csv", sep=""))
}
Now, I would like to write these groups as CSV files to the disk. Can anyone suggest me how to do that?
EDIT1:
Based on the comments, I have modified the code and getting the following error:
Error in data.frame(
0
= list(nih_addr_id = c(664L, 665L, 666L, 667L, : arguments imply differing number of rows: 65000, 46677
Upvotes: 4
Views: 4208
Reputation: 660
Here is a solution with lapply
and data.table
, which is fast - even for large datasets. The file is chunked by splitting the vector my_file_rows
in chunks of 65k by rownumbers as set by chunk_size
. The remainder is automatically taken care of by using the split
function. You can easily adjust the number of rows by adjusting the number in chunk_size
to your preferences. This solution pastes the beginning rownumber of each chunk into the file name by pasting x[1]
.
my_file_rows <- seq(1, nrow(my_file))
chunk_size <- 65e3
lapply(split(my_file_rows, ceiling(my_file_rows/chunk_size)), function(x){
fwrite(my_file[x,], paste0("path/output_file", x[1], ".csv"))
})}
Upvotes: 2
Reputation: 1027
Your write.csv
in the loop is trying to write the list as a .csv file, rather than the dataframe element of the list.
Try:
my_file <- read.csv("~myfile_path/file.csv")
grps <- (split(my_file, (seq(nrow(my_file))-1) %/% 65000))
for (i in seq_along(grps)) {
write.csv(grps[[i]], paste0("path/output_file", i, ".csv"))
}
Upvotes: 3