How can I skip Excel Files if they do not contain a specific Sheet?

Here's the code I'm using:

library(readxl) 
setwd(file.path(dirname("~"), "/Shared Documents/Programs/Data and Reporting/Data Quality Reports/Org Level Data"))
lst = list.files(pattern="*.xlsx")
df = data.frame()
for(table in lst){
  dataFromExcel <- read_excel(table, sheet = "Project Summary")
  df <- rbind(df,dataFromExcel)
}
write.csv(df, "_Project Level data.csv")

I basically know nothing about R, and simply mashed together code from a couple sites, editing what little I understood.

Here's the scenario: I have a bunch of Excel files that I download and put into a folder called "Org Level Data". I run this script and it creates a new file with all the data in each file's "Project Summary" sheet. However, it errors out if one of those files does not contain a sheet called "Project Summary", which will be quite a few files. I can get around this by removing those files from the folders, but I'd really like this script to just skip those files and ignore them, if possible.

I saw something about read_excel_safely but I cannot figure out how to insert that into my code, since I understand very little about the "read_excel" and "rbind" sections.

I tried adding , na = "---") to the read_excel() function, thinking that it would handle that not existing, but it'd didn't add or take away anything from the script running.

I've also looked into trying to make lst check for that "Project Summary" sheet when being created, but I don't know how to do that.

Upvotes: 0

Views: 44

Answers (1)

stefan
stefan

Reputation: 125373

You can use an if with readxl::excel_sheets to check whether a file contains a sheet with the required name and if not skip that file in your for loop by passing on to the next file, e.g. in the code below the second file is skipped.

Using some fake example data and excel files:

library(readxl)

path <- tempdir()

### Create example files
writexl::write_xlsx(
  list(
    "Project Summary" = mtcars
  ),
  file.path(path, "mtcars.xlsx")
)

writexl::write_xlsx(
  list(
    "Sheet" = iris
  ),
  file.path(path, "iris.xlsx")
)
###

setwd(path)

lst <- list.files(pattern = "*.xlsx")
df <- data.frame()
for (table in lst) {
  if (!"Project Summary" %in% excel_sheets(table)) next
  dataFromExcel <- read_excel(table, sheet = "Project Summary")
  df <- rbind(df, dataFromExcel)
}
#write.csv(df, "_Project Level data.csv")

df
#> # A tibble: 32 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows

Upvotes: 1

Related Questions