Reputation: 147
I would like to have R return a list of all Excel files within a folder that contain one of a number of IDs within the file itself - I am not looking at filenames, for which list.files could be used. Below code sets the filepath and creates a list of the necessary IDs to search for, is there a simple way of having R loop through each file in the folder and return a list of files that have at least one matching ID within the data contained in the file itself, not the filename?
#Set filepath
MYFILEPATH <- "\\\\dcfnetwork\\data\\\\R\\Test Folder"
#set list of IDs to check for
IDList <- c("l21", "d66", "f53")
Upvotes: 0
Views: 453
Reputation: 10637
This will give your file paths to Escel files having at leat one id in at least one cell in at least one worksheet:
library(tidyverse)
library(readxl)
IDList <- c("l21", "d66", "f53")
names(IDList) <- IDList
data <-
tibble(id = IDList) %>%
expand_grid(file = list.files("files/", pattern = "xlsx?$", full.names = TRUE)) %>%
mutate(
# for all combos of id and file
contains_id = map2_lgl(id, file, function(id, file) {
# for all worksheets
excel_sheets(file) %>%
map(~ read_excel(file, sheet = .x)) %>%
# tibble to text
map(~ .x %>%
as.character() %>%
paste0(collapse = " ")) %>%
# detect pattern
map(~ .x %>% str_detect(id)) %>%
# one occurrence in any sheet is sufficient
any()
})
)
#> # A tibble: 6 x 3
#> id file contains_id
#> <chr> <chr> <lgl>
#> 1 l21 files//file1.xlsx FALSE
#> 2 l21 files//file2.xlsx TRUE
#> 3 d66 files//file1.xlsx FALSE
#> 4 d66 files//file2.xlsx TRUE
#> 5 f53 files//file1.xlsx TRUE
#> 6 f53 files//file2.xlsx FALSE
data %>%
filter(contains_id) %>%
pull(file) %>%
unique()
#> [1] "files//file1.xlsx" "files//file2.xlsx"
Created on 2021-11-25 by the reprex package (v2.0.1)
Sheet 2 of file1.xlsx:
x y
f53 1
f53 2
Sheet 1 of file2.xlsx:
x y
1 d66
2 l21
Upvotes: 1