New2Programming
New2Programming

Reputation: 383

Reading multiple xl files into dataframe

I have been using XLConnect function loadworkbook to load each xlsx file into R then rbind to merge them together. what is the best way of doing this instead of writing multiple df to later merge them. I am trying to use the code below to merge my excel files into 2 dataframes(2 sheet names for most files). The columns are always the same but the file names will change.

Current /slow way

require(XLConnect)
df <- loadWorkbook(paste(location,'UK.xlsx',sep=""))
dfb <- loadWorkbook(paste(location,'US.xlsx',sep=""))
UK <-readWorksheet(df,sheet="School",startRow=0,startCol=0,autofitRow=TRUE,endCol=21,header=TRUE)
US <-readWorksheet(dfb,sheet="School",startRow=0,startCol=0,autofitRow=TRUE,endCol=21,header=TRUE)
School  <-  rbind(UK,US)
UK <-readWorksheet(df,sheet="College",startRow=0,startCol=0,autofitRow=TRUE,endCol=21,header=TRUE)
US <-readWorksheet(dfb,sheet="College",startRow=0,startCol=0,autofitRow=TRUE,endCol=21,header=TRUE)
College  <- rbind(UK,US)

New code

require(readxl) filelist<- list.files(location,pattern='xlsx',full.names = T) How can I read each sheetname into a dataframe when not every file has both sheetnames. I need 2 dataframes 1 for School and 1 for College. I think I need to try something like Schools <-lapply(filelist, read_excel, sheet="School") but I get Error: Sheet 'School' not found. I think this error is because sheet School is not on every file. I am using list.files because the filenames are not always the same.

Upvotes: 0

Views: 590

Answers (1)

Daniel
Daniel

Reputation: 7832

What about this approach?

library(purrr)
library(readxl)
# filenames to xl-sheets
files <- sprintf("Mappe%i.xlsx", 1:3)

# read only df for xl-files with school-sheet
xl_school <- map_if(files, ~ "School" %in% excel_sheets(.x), ~read_excel(.x))
# read only df for xl-files with college-sheet
xl_college <- map_if(files, ~ "College" %in% excel_sheets(.x), ~read_excel(.x))

# combine school-files to data frame (repeat same for college)
school_df <- map_df(xl_school, function(x) if(is.data.frame(x)) x)

school_df
#> # A tibble: 3 × 1
#>      Test
#>     <chr>
#> 1    fdsf
#> 2  543534
#> 3 gfdgfdd

You might need to force the column type to be text. Just add col_types = "text" to the read_excel()-call:

# read only df for xl-files with school-sheet
xl_school <- map_if(files, ~ "School" %in% excel_sheets(.x), ~read_excel(.x, col_types = "text"))
# read only df for xl-files with college-sheet
xl_college <- map_if(files, ~ "College" %in% excel_sheets(.x), ~read_excel(.x, col_types = "text"))

enter image description here enter image description here enter image description here

Upvotes: 1

Related Questions