Split CSV file into multiple files using R without changing its data formatting

Question

I have a myfile.csv file below whose columns are separated by a pipe (|) and I want to use R to split the file into 2 csv files with equal number of rows and with each csv file retaining the header line.

userid|logindatetime|logoutdatetime|description
111|2024-02-10 08:00:00|2024-02-10 08:00:00|systemadministrator
112|2024-02-11 08:00:00|2024-02-10 08:00:05|user1
113|2024-02-12 08:00:00|2024-02-10 08:00:10|user2
114|2024-02-13 08:00:00|2024-02-10 08:00:15|user3
115|2024-02-14 08:00:00|2024-02-10 08:00:20|user4
116|2024-02-15 08:00:00|2024-02-10 08:00:25|user5
117|2024-02-16 08:00:00|2024-02-10 08:00:30|user6
118|2024-02-17 08:00:00|2024-02-10 08:00:35|user7
111|2024-02-18 08:00:00|2024-02-10 08:00:40|systemadministrator
119|2024-02-18 08:00:00|2024-02-10 08:00:45|user8

I have followed the solution below which was suggested here Split large file in R into smaller files with a loop which partly works.

library(tidyverse)

rowCount <- 2
data %>% 
  mutate(Group = ceiling((row_number()) / rowCount)) %>% 
  group_by(Group) %>% 
  group_walk(
    function(.x, .y) {
      write.csv(.x, file = paste0("myfile.csv", .y$Group, ".csv"))
    }
  )

However the suggested solution, does not maintain the pipe (|) separator but instead introduces the comma (,) separator and it also introduces double quotes("") on string values as well as a row counter. I need to maintain the original structure and format of my csv file when it is split. Below is the expected output for my 2 files after splitting.

Expected output: file 1

userid|logindatetime|logoutdatetime|description
111|2024-02-10 08:00:00|2024-02-10 08:00:00|systemadministrator
112|2024-02-11 08:00:00|2024-02-10 08:00:05|user1
113|2024-02-12 08:00:00|2024-02-10 08:00:10|user2
114|2024-02-13 08:00:00|2024-02-10 08:00:15|user3
115|2024-02-14 08:00:00|2024-02-10 08:00:20|user4

Expected output: file 2

userid|logindatetime|logoutdatetime|description
116|2024-02-15 08:00:00|2024-02-10 08:00:25|user5
117|2024-02-16 08:00:00|2024-02-10 08:00:30|user6
118|2024-02-17 08:00:00|2024-02-10 08:00:35|user7
111|2024-02-18 08:00:00|2024-02-10 08:00:40|systemadministrator
119|2024-02-18 08:00:00|2024-02-10 08:00:45|user8

The incorrect output I get e.g., for output 1 is below

"","userid","logindatetime","logoutdatetime","description"
"1",111,"2024-02-10 08:00:00","2024-02-10 08:00:00","systemadministrator"
"2",112,"2024-02-11 08:00:00","2024-02-10 08:00:05","user1"
"3",113,"2024-02-12 08:00:00","2024-02-10 08:00:10","user2"
"4",114,"2024-02-13 08:00:00","2024-02-10 08:00:15","user3"
"5",115,"2024-02-14 08:00:00","2024-02-10 08:00:20","user4"

I have tried to specifiy the column separator (sep = "|") and ignore the (quote = "") and include headers (header = TRUE) using the code below.

rowCount <- 2
data %>% 
  mutate(Group = ceiling((row_number()) / rowCount)) %>% 
  group_by(Group) %>% 
  group_walk(
    function(.x, .y) {
      write.csv(.x, file = paste0("myfile.csv", .y$Group, ".csv", sep = "|", header = TRUE, quote = "", stringsAsFactors = FALSE))
    }
  )

However I get the error below. How can I get my desired output?

Error in file(file, ifelse(append, "a", "w")) : 
  cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
  cannot open file 'myfile.csv1.csv|TRUEFALSE': Invalid argument

Split CSV file into multiple files using R without changing its data formatting

Answers (1)

Related Questions