Reading excel file into R and Merging all the sheets in it into a single dataframe

Question

I have a excel sheet with varying number of sheets in it (it can have 2 sheets or 3 sheets or just depends on the user). Headers of all the sheets are same. (first two rows are considered as headers)

I want to merge all these sheets and put it into a single dataframe.

The files are of .xlsx format. and each sheets contain huge number of rows (30 columns and 8000 rows).

I am a beginner in reading excel files in R. I am going through them, in the mean time if anyone knows how to implement this please let me know.

An example of Excel sheet is something like this Data

PS: I want to implement all this in shiny. so, please do mention if there is any efficient method for shiny.

ui.R

fileInput('file2', h5('Choose Your Observation Data'), accept=c('text/csv','text/comma-separated-values,text/plain','.xlsx'))

server.R

b <- reactive({
   fileinput2 <- input$file2
   if (is.null(fileinput2))
   return(NULL)
   #xlfile <- list.files(pattern = ".xlsx")
   xlfile <- fileinput2[1]
   wb <- loadWorkbook(xl_file)
   sheet_ct <- wb$getNumberOfSheets()
   b <- rbindlist(pblapply(1:sheet_ct, function(x) {
     res <- read.xlsx(xl_file, x)
   }), fill=TRUE)
   b <- b [-c(1),]
   print (b)
   })

hrbrmstr · Accepted Answer

You can start with A million ways to connect R and Excel. This is a basic snippet to get you a bit further along using the xlsx package. It's java-based and it's s l o w, hence the use of progress bars.

This snippet takes a very naive approach since every real world excel spreadsheet I've ever seen is usually a wretched pile of numerical madness and one can rarely guarantee the columns are all lined up properly and consistently named. To that end, I wantonly use rbindlist from data.table with the fill option to deal with any column inconsistencies.

The result is not perfect (you'll need to take care of the extra header row), but you're reading from Excel, which is also far from perfect.

library(xlsx)        # excel reading
library(pbapply)     # free progress bars
library(data.table)  # rbindlist

xl_file <- "Data.xlsx"

wb <- loadWorkbook(xl_file)
sheet_ct <- wb$getNumberOfSheets()

dat <- rbindlist(pblapply(1:sheet_ct, function(x) {
  res <- read.xlsx(xl_file, x)
}), fill=TRUE)

head(dat)
##    EN MN  ED   HO    TM    SL   PH        DI      TA   DI.1     CH     PI
## 1:  #  # day hour  degC    NA   NA    µmol/L µmol/kg µmol/L   µg/L µmol/L
## 2:  1  1   1   12   9.9 31.23 7.82 2126.1575    2151   15.3   0.93     NA
## 3:  1  1   2   36  9.59 31.17 7.84 2120.4175    2150   14.2 1.2044   0.69
## 4:  1  1   3   60  9.65 31.13 7.84  2110.885    2143   14.3 0.9137   2.85
## 5:  1  1   4   84 10.36 31.16 7.83 2105.4525    2137   13.8 0.7189   7.29
## 6:  1  1   5  108 10.06 31.13 7.84 2106.4775    2139   13.7  0.317   5.24
##        PO     PN     PP     DC     DN     DP     TP Exp.num Mesocosm
## 1: µmol/L µmol/L µmol/L µmol/L µmol/L µmol/L µmol/L      NA       NA
## 2:     NA  2.319  0.032  100.4     NA     NA 5.6306      NA       NA
## 3:  24.16  2.598  0.048  104.5     NA     NA 2.3034      NA       NA
## 4: 34.815  2.095  0.059     NA     NA     NA 2.5594      NA       NA
## 5: 40.999  2.186  0.056   97.5     NA     NA 5.8865      NA       NA
## 6: 37.751  2.173  0.081     NA     NA     NA 6.1425      NA       NA
##    Exp.day Hour Temperature Salinity pH DIC DIN Chl.a PIC POC PON POP DOC
## 1:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
## 2:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
## 3:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
## 4:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
## 5:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
## 6:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
##    DON DOP TEP
## 1:  NA  NA  NA
## 2:  NA  NA  NA
## 3:  NA  NA  NA
## 4:  NA  NA  NA
## 5:  NA  NA  NA
## 6:  NA  NA  NA

An alternate approach would be to batch convert your Excel files to CSV, which may have more efficacy in the long run.

Reading excel file into R and Merging all the sheets in it into a single dataframe

Answers (1)

Related Questions