Reputation: 317
I am trying to read data from Multiple Workbooks which have Multiple Worksheets using r. There are 10 workbooks and each has data in two worksheets.
Following code works in pulling data from the first sheet. However I'd like to also pull the data on another sheet in the same workbook. I am not sure how to specify the sheet name in the following code.
library(purrr)
library(readxl)
library(dplyr)
library(tidyr)
data_path <- "C:/Desktop/Test"
files <- dir(data_path, pattern = "*.xlsx")
weights_data <- data.frame(filename = files) %>%
mutate(file_contents = map(filename,
~ read_excel(file.path
(data_path, .))))
View(unnest(weights_Data))
Upvotes: 1
Views: 361
Reputation: 160872
read_excel
takes another argument that lets you specify a specific sheet:
sheet: Sheet to read. Either a string (the name of a sheet), or an
integer (the position of the sheet). Ignored if the sheet is
specified via 'range'. If neither argument specifies the
sheet, defaults to the first sheet.
With that, we need to expand your frame of paths to include sheets, easily done with readxl::excel_sheets
which for a single path returns a vector of sheet names.
Iteratively, for discussion/walk-through, though only the last block is required:
library(tibble)
library(dplyr)
library(tidyr)
library(purrr)
library(readxl)
data_frame(
path = list.files(path = "~/StackOverflow/Prah/", pattern = "*.xlsx", full.names = TRUE)
) %>%
mutate(sheets = map(path, excel_sheets))
# # A tibble: 3 x 2
# path sheets
# <chr> <list>
# 1 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" <chr [2]>
# 2 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" <chr [2]>
# 3 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" <chr [2]>
This alone isn't immediately helpful, but we can unnest
it:
data_frame(
path = list.files(path = "~/StackOverflow/Prah/", pattern = "*.xlsx", full.names = TRUE)
) %>%
mutate(sheets = map(path, excel_sheets)) %>%
unnest(sheets)
# # A tibble: 6 x 2
# path sheets
# <chr> <chr>
# 1 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet1
# 2 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet2
# 3 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet1
# 4 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet2
# 5 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet1
# 6 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet2
Now it should become clear that we now just need to iterate over each row using map2
or similar, and we get a nested tidy frame with data:
data_frame(
path = list.files(path = "~/StackOverflow/Prah/", pattern = "*.xlsx", full.names = TRUE)
) %>%
mutate(sheets = map(path, excel_sheets)) %>%
unnest(sheets) %>%
mutate(data = map2(path, sheets, ~ read_excel(path = .x, sheet = .y)))
# # A tibble: 6 x 3
# path sheets data
# <chr> <chr> <list>
# 1 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet1 <tibble [32 x 11]>
# 2 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet2 <tibble [32 x 11]>
# 3 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet1 <tibble [32 x 11]>
# 4 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet2 <tibble [32 x 11]>
# 5 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet1 <tibble [32 x 11]>
# 6 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet2 <tibble [32 x 11]>
(I made a few excel workbooks, each had two sheets, each sheet had mtcars
on it. Nothing fancy.)
Upvotes: 2