Dal
Dal

Reputation: 317

r - How to Read Data from Multiple Workbooks having multiple Worksheets into R?

I am trying to read data from Multiple Workbooks which have Multiple Worksheets using r. There are 10 workbooks and each has data in two worksheets.

Following code works in pulling data from the first sheet. However I'd like to also pull the data on another sheet in the same workbook. I am not sure how to specify the sheet name in the following code.

library(purrr)
library(readxl)
library(dplyr)
library(tidyr)

data_path <- "C:/Desktop/Test"

files <- dir(data_path, pattern = "*.xlsx")


weights_data <- data.frame(filename = files) %>%
               mutate(file_contents = map(filename,
                                                 ~ read_excel(file.path
                                                              (data_path,  .))))

View(unnest(weights_Data))

Upvotes: 1

Views: 361

Answers (1)

r2evans
r2evans

Reputation: 160872

read_excel takes another argument that lets you specify a specific sheet:

sheet: Sheet to read. Either a string (the name of a sheet), or an
       integer (the position of the sheet). Ignored if the sheet is
       specified via 'range'. If neither argument specifies the
       sheet, defaults to the first sheet.

With that, we need to expand your frame of paths to include sheets, easily done with readxl::excel_sheets which for a single path returns a vector of sheet names.

Iteratively, for discussion/walk-through, though only the last block is required:

library(tibble)
library(dplyr)
library(tidyr)
library(purrr)
library(readxl)

data_frame(
  path = list.files(path = "~/StackOverflow/Prah/", pattern = "*.xlsx", full.names = TRUE)
) %>%
  mutate(sheets = map(path, excel_sheets))
# # A tibble: 3 x 2
#   path                                        sheets   
#   <chr>                                       <list>   
# 1 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" <chr [2]>
# 2 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" <chr [2]>
# 3 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" <chr [2]>

This alone isn't immediately helpful, but we can unnest it:

data_frame(
  path = list.files(path = "~/StackOverflow/Prah/", pattern = "*.xlsx", full.names = TRUE)
) %>%
  mutate(sheets = map(path, excel_sheets)) %>%
  unnest(sheets)
# # A tibble: 6 x 2
#   path                                        sheets
#   <chr>                                       <chr> 
# 1 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet1
# 2 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet2
# 3 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet1
# 4 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet2
# 5 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet1
# 6 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet2

Now it should become clear that we now just need to iterate over each row using map2 or similar, and we get a nested tidy frame with data:

data_frame(
  path = list.files(path = "~/StackOverflow/Prah/", pattern = "*.xlsx", full.names = TRUE)
) %>%
  mutate(sheets = map(path, excel_sheets)) %>%
  unnest(sheets) %>%
  mutate(data = map2(path, sheets, ~ read_excel(path = .x, sheet = .y)))
# # A tibble: 6 x 3
#   path                                        sheets data              
#   <chr>                                       <chr>  <list>            
# 1 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet1 <tibble [32 x 11]>
# 2 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet2 <tibble [32 x 11]>
# 3 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet1 <tibble [32 x 11]>
# 4 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet2 <tibble [32 x 11]>
# 5 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet1 <tibble [32 x 11]>
# 6 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet2 <tibble [32 x 11]>

(I made a few excel workbooks, each had two sheets, each sheet had mtcars on it. Nothing fancy.)

Upvotes: 2

Related Questions