JeffWithpetersen
JeffWithpetersen

Reputation: 147

Create list of all files within a folder where one of a number of IDs are contained with the file dataframe, not the file name

I would like to have R return a list of all Excel files within a folder that contain one of a number of IDs within the file itself - I am not looking at filenames, for which list.files could be used. Below code sets the filepath and creates a list of the necessary IDs to search for, is there a simple way of having R loop through each file in the folder and return a list of files that have at least one matching ID within the data contained in the file itself, not the filename?

#Set filepath
MYFILEPATH <- "\\\\dcfnetwork\\data\\\\R\\Test Folder"

#set list of IDs to check for
IDList <- c("l21", "d66", "f53")

Upvotes: 0

Views: 453

Answers (1)

danlooo
danlooo

Reputation: 10637

This will give your file paths to Escel files having at leat one id in at least one cell in at least one worksheet:

library(tidyverse)
library(readxl)

IDList <- c("l21", "d66", "f53")
names(IDList) <- IDList

data <-
  tibble(id = IDList) %>%
  expand_grid(file = list.files("files/", pattern = "xlsx?$", full.names = TRUE)) %>%
  mutate(
    # for all combos of id and file
    contains_id = map2_lgl(id, file, function(id, file) {
      # for all worksheets
      excel_sheets(file) %>%
        map(~ read_excel(file, sheet = .x)) %>%
        # tibble to text
        map(~ .x %>%
          as.character() %>%
          paste0(collapse = " ")) %>%
        # detect pattern
        map(~ .x %>% str_detect(id)) %>%
        # one occurrence in any sheet is sufficient
        any()
    })
  )
#> # A tibble: 6 x 3
#>   id    file              contains_id
#>   <chr> <chr>             <lgl>      
#> 1 l21   files//file1.xlsx FALSE      
#> 2 l21   files//file2.xlsx TRUE       
#> 3 d66   files//file1.xlsx FALSE      
#> 4 d66   files//file2.xlsx TRUE       
#> 5 f53   files//file1.xlsx TRUE       
#> 6 f53   files//file2.xlsx FALSE

data %>%
  filter(contains_id) %>%
  pull(file) %>%
  unique()
#> [1] "files//file1.xlsx" "files//file2.xlsx"

Created on 2021-11-25 by the reprex package (v2.0.1)

Example data

Sheet 2 of file1.xlsx:

x   y
f53 1
f53 2

Sheet 1 of file2.xlsx:

x   y
1   d66
2   l21

Upvotes: 1

Related Questions