dunguyen
dunguyen

Reputation: 33

R: read a csv file from multiple folders and write a xslx file keeping the sheet names

The directories' structure is:

data -> topic1 -> question1 -> sheetName.csv
               -> question2 -> sheetName.csv
               ...
     -> topic2 -> question1 -> sheetName.csv
               -> question2 -> sheetName.csv
     ...

The output I want an an excel file for each 'topic'. In each file, there are sheets that correpsond to the sheetName.csv within that topic. E.g. an excel file named: topic1.xlsx with 3 sheets, coresponding to 3 sheetName.csv files in topic 1.

BUT I also want to keep the sheet names as in the original .csv files. Note that the 'sheetName' is random (i.e. not follow any pattern).

Here are the codes I have tried so far:

library(readxl)
library(writexl)
library(dplyr)

pathName <- "/data/"
topicName <- list.files(path = pathName)
for(i in 1:length(topicName)) {
  topicPath <- paste(pathName, topicName[[i]], sep = "")
  files_to_read = list.files(
    path = topicPath,
    pattern = '*.csv',
    recursive = TRUE,
    full.names = TRUE
  )
  data_lst <- list()  
  data_lst <- lapply(files_to_read, read.csv)
  setwd(pathName)  
  write_xlsx(data_lst, path = paste(topicName[[i]], ".", "xlsx", sep = ""))
}

The output I got is an excel file for each topic with the corresponding csv sheets, but the sheetnames are "sheet 1, sheet 2, etc...". Is there a way to keep the sheet names while writing to an excel file?

Upvotes: 0

Views: 288

Answers (1)

Tyler Burleigh
Tyler Burleigh

Reputation: 537

OK first I'll programmatically generate CSV files that mirrors the directory structure you described. The CSVs will be named as random strings of digits.

dir.create('data')
topics <- c("topic1", "topic2")
questions <- c("question1", "question2")

for(i in 1:length(topics)){
  dir.create(paste0('data/', topics[i]), showWarnings = F)
  for(j in 1:length(questions)){
    dir.create(paste0('data/', topics[i], "/", questions[j]), showWarnings = F)
    for(k in 1:3){
      set.seed(Sys.time())
      Sys.sleep(1)
      sheet <- as.character(round(runif(1, 1, 99999999)))
      print(sheet)
      file.name = paste0('data/', topics[i], "/", questions[j], "/", sheet, ".csv")
      write.csv(data.frame(x = 1), file = file.name)
    }
  }
}

Next, to answer your question,

To write the CSV sheet names as XLSX workbook names, I created a for loop that gets the sheet name from the file name using two calls to strsplit(), and then calls xlsx::write.xlsx() to write the file. I used the xlsx package for writing xlsx because it allows specifying a sheet name and writing to the same xlsx with an append flag.

library(xlsx)
library(dplyr)

pathName <- "data/"
topicName <- list.files(path = pathName)
for(i in 1:length(topicName)) {
  topicPath <- paste(pathName, topicName[[i]], sep = "")
  files_to_read = list.files(
    path = topicPath,
    pattern = '*.csv',
    recursive = TRUE,
    full.names = TRUE
  )
  data_lst <- list()
  for(k in 1:length(files_to_read)){
    sheet_name <- strsplit(strsplit(files_to_read[k], "/")[[1]][4], "\\.")[[1]][1]
    file_name <- paste0("data/", topicName[[i]], ".xlsx")
    dat <- read.csv(files_to_read[k])
    write.xlsx(dat, file=file_name, sheetName=sheet_name, row.names=F, append=T)
  }
}

Upvotes: 0

Related Questions