Reputation: 3
I am trying to combine a specific cell range within multiple Excel workbooks (which each have multiple sheets) into an R dataframe. Each of the Excel workbooks has an 'Instructions' sheet that I would like to ignore, then all subsequent sheets are structured in the same layout (and I would like to select cells H4:R17 from each). So far, I have managed to combine all cells across all worksheets within all workbooks into one very messy dataframe. However, I am struggling to identify the correct code to ignore the 'Instructions' sheet when reading this in and to subset by the required cell range (H4:R17). I've included my code below:
Any ideas?
#how to join multiple files based on multiple sheets
read_multiple_excel <- function(path) {
path %>%
excel_sheets() %>%
set_names() %>%
map_df(read_excel,path = path)
}
#joining all data.frames
data_df <- dir_ls(regexp = "xlsx") %>%
map_df(read_multiple_excel,
.id = "workbook_name")
str(data_df)
Upvotes: 0
Views: 356
Reputation: 8880
library(tidyverse)
library(readxl)
files <- list.files(
path = "./sourse/xlsx",
pattern = "\\.xlsx$",
full.names = T
)
list of files
[1] "./sourse/xlsx/df1.xlsx" "./sourse/xlsx/df2.xlsx"
list of sheet files
link <- data.frame(files = files) %>%
mutate(sheets = map(files, excel_sheets)) %>%
unnest(sheets) %>%
filter(sheets != "instruction")
# A tibble: 4 x 2
files sheets
<chr> <chr>
1 ./sourse/xlsx/df1.xlsx sh1
2 ./sourse/xlsx/df1.xlsx sh2
3 ./sourse/xlsx/df2.xlsx sh1
4 ./sourse/xlsx/df2.xlsx sh2
result
map2_df(.x = set_names(link$files), .y = link$sheets, read_xlsx, .id = "workbook", range = "A1:B3")
# A tibble: 8 x 3
workbook x y
<chr> <dbl> <dbl>
1 ./sourse/xlsx/df1.xlsx 1 3
2 ./sourse/xlsx/df1.xlsx 2 4
3 ./sourse/xlsx/df1.xlsx 5 7
4 ./sourse/xlsx/df1.xlsx 6 8
5 ./sourse/xlsx/df2.xlsx 9 11
6 ./sourse/xlsx/df2.xlsx 10 12
7 ./sourse/xlsx/df2.xlsx 13 15
8 ./sourse/xlsx/df2.xlsx 14 16
Upvotes: 1