Reputation: 3
I have 1000 .xlsx files with the same structure. All contain a header row (id, filling date, Item 1 to 11) and a row with values. The cell under Item 11 contains a comment in most files. How can I extract the comments from all files and combine them into a single object in R?
I managed to combine all files into a single data.frame by creating a list of the files files <- list.files(pattern = "*.xlsx", full.names = T)
, reading them into R using sapply(files, read_excel)
and combining them with bind_rows()
however, read_excel
doesn't import the comments. I used the packages readxl
and dplr
for this.
I also managed to extract the comment from a single file using xlsx_cells("file.xlsx")
and x[x$address=="N8", c("address", "comment")]
but I don't know how to do this with multiple files. I used the package dplr
and tidyxl
for this approach.
I would very much appreciate your help!
Upvotes: 0
Views: 243
Reputation: 66480
Here's an approach using purrr
:
Edit: altered solution to output the source file for each comment and to work with files lacking such a comment, as the OP specified the comment is there in "most" files.
library(tidyxl)
library(purrr)
# First, here's a list of xlsx files in the directory:
file_list <- list.files() %>%
.[str_detect(., ".xlsx")]
file_list
#[1] "test1.xlsx" "test2.xlsx" "test3 no comment.xlsx"
# Make a new tibble with two columns:
# file_name is the source file we're looking at
# comments extracts the comments in N8, if any
tibble(file_name = file_list,
comments = map(file_list,
~ xlsx_cells(.) %>%
subset(address == "N8", comment))) %>%
unnest(comments, keep_empty = TRUE)
## A tibble: 3 x 2
# file_name comment
# <chr> <chr>
#1 test1.xlsx Comment in file 1
#2 test2.xlsx Comment in file 2
#3 test3 no comment.xlsx NA
Upvotes: 2