R_Student
R_Student

Reputation: 789

Concatenate columns from Excel file based on sheet name and column's name

I have an Excel file that has multiple sheetnames and these sheet names don't always have the same structure. I want to be able to read the Excel file, read only some specifics sheets, select some specific columns and then create a concatenated table I have the following Excel file:

File_Mother.xlsx it has 5 sheets but we would only need those sheets that have the pattern "inning_19" at the beginning of their names so "inning_195" and "inning_1988" will be of interest if they exist within the Excel file, extract the columns ID_MATCH and SET_GAME along with all other columns that starts with the pattern "id_complete_game_on" and finally create a column that will let me know the sheet name that this info was taken from when we bind vertically all the data selected. For example:

From File_Mother.xlsx I have sheet.name = "inning_195" with the following info

ID_MATCH SET_GAME id_complete_game_on_field id_complete_game_on_camp
X01 M cleared not cleared
X85 Q cleared cleared
X89 U cleared cleared

From File_Mother.xlsx I have sheet.name = "inning_1988" with the following info

ID_MATCH SET_GAME id_complete_game_on_demand_one id_complete_game_on_camp
IKX01 MR completed not cleared
IKX02 DD not completed cleared
IKX03 TT draw cleared

after the desired data wrangling the output will be

ID_MATCH SET_GAME id_complete_game_on_field id_complete_game_on_camp id_complete_game_on_demand_one sheet name
X01 M cleared not cleared NA inning_195
X85 Q cleared cleared NA inning_195
X89 U cleared cleared NA inning_195
IKX01 MR NA NA completed inning_1988
IKX02 DD NA NA not completed inning_1988
IKX03 TT NA NA draw inning_1988

Upvotes: -1

Views: 474

Answers (1)

Kra.P
Kra.P

Reputation: 15153

I slightly edit reading all sheets of excel by @Jeromy Anglim.

library(dplyr)
library(purrr)

read_excel_somesheets <- function(filename, key,tibble = FALSE) {
  sheets <- readxl::excel_sheets(filename)
  x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X) %>% mutate(sheet_name = X))
  if(!tibble) x <- lapply(x, as.data.frame)
  names(x) <- sheets
  y <- x[grepl(key, names(x))] %>%
    reduce(full_join)
  y
}

Then you may try

read_excel_somesheets("your_path/File_Mother.xlsx", "inning_19")

And result is like

  ID_MATCH SET_GAME id_complete_game_on_field id_complete_game_on_camp  sheet_name id_complete_game_on_demand_one
1      X01        M                   cleared              not cleared  inning_195                           <NA>
2      X85        Q                   cleared                  cleared  inning_195                           <NA>
3      X89        U                   cleared                  cleared  inning_195                           <NA>
4    IKX01       MR                      <NA>              not cleared inning_1988                      completed
5    IKX02       DD                      <NA>                  cleared inning_1988                  not completed
6    IKX03       TT                      <NA>                  cleared inning_1988                           draw

Upvotes: 1

Related Questions