coRn
coRn

Reputation: 3

How to extract comments from multiple similar structured excel files in R?

I have 1000 .xlsx files with the same structure. All contain a header row (id, filling date, Item 1 to 11) and a row with values. The cell under Item 11 contains a comment in most files. How can I extract the comments from all files and combine them into a single object in R?

I managed to combine all files into a single data.frame by creating a list of the files files <- list.files(pattern = "*.xlsx", full.names = T), reading them into R using sapply(files, read_excel)and combining them with bind_rows() however, read_excel doesn't import the comments. I used the packages readxl and dplr for this.

I also managed to extract the comment from a single file using xlsx_cells("file.xlsx") and x[x$address=="N8", c("address", "comment")] but I don't know how to do this with multiple files. I used the package dplrand tidyxlfor this approach.

I would very much appreciate your help!

Upvotes: 0

Views: 243

Answers (1)

Jon Spring
Jon Spring

Reputation: 66480

Here's an approach using purrr:

Edit: altered solution to output the source file for each comment and to work with files lacking such a comment, as the OP specified the comment is there in "most" files.

library(tidyxl)
library(purrr)

# First, here's a list of xlsx files in the directory:
file_list <-  list.files() %>%
  .[str_detect(., ".xlsx")]
file_list
#[1] "test1.xlsx"            "test2.xlsx"            "test3 no comment.xlsx"


# Make a new tibble with two columns: 
#   file_name   is the source file we're looking at
#   comments    extracts the comments in N8, if any
tibble(file_name = file_list,
       comments = map(file_list,
                      ~ xlsx_cells(.) %>%
                        subset(address == "N8", comment))) %>%
  unnest(comments, keep_empty = TRUE)


## A tibble: 3 x 2
#  file_name             comment          
#  <chr>                 <chr>            
#1 test1.xlsx            Comment in file 1
#2 test2.xlsx            Comment in file 2
#3 test3 no comment.xlsx NA     

Upvotes: 2

Related Questions