Reputation: 737
I have a excel sheet with varying number of sheets in it (it can have 2 sheets or 3 sheets or just depends on the user). Headers of all the sheets are same. (first two rows are considered as headers)
I want to merge all these sheets and put it into a single dataframe.
The files are of .xlsx format. and each sheets contain huge number of rows (30 columns and 8000 rows).
I am a beginner in reading excel files in R. I am going through them, in the mean time if anyone knows how to implement this please let me know.
An example of Excel sheet is something like this Data
PS: I want to implement all this in shiny. so, please do mention if there is any efficient method for shiny.
ui.R
fileInput('file2', h5('Choose Your Observation Data'), accept=c('text/csv','text/comma-separated-values,text/plain','.xlsx'))
server.R
b <- reactive({
fileinput2 <- input$file2
if (is.null(fileinput2))
return(NULL)
#xlfile <- list.files(pattern = ".xlsx")
xlfile <- fileinput2[1]
wb <- loadWorkbook(xl_file)
sheet_ct <- wb$getNumberOfSheets()
b <- rbindlist(pblapply(1:sheet_ct, function(x) {
res <- read.xlsx(xl_file, x)
}), fill=TRUE)
b <- b [-c(1),]
print (b)
})
Upvotes: 0
Views: 1527
Reputation: 78792
You can start with A million ways to connect R and Excel. This is a basic snippet to get you a bit further along using the xlsx
package. It's java-based and it's s l o w, hence the use of progress bars.
This snippet takes a very naive approach since every real world excel spreadsheet I've ever seen is usually a wretched pile of numerical madness and one can rarely guarantee the columns are all lined up properly and consistently named. To that end, I wantonly use rbindlist
from data.table
with the fill
option to deal with any column inconsistencies.
The result is not perfect (you'll need to take care of the extra header row), but you're reading from Excel, which is also far from perfect.
library(xlsx) # excel reading
library(pbapply) # free progress bars
library(data.table) # rbindlist
xl_file <- "Data.xlsx"
wb <- loadWorkbook(xl_file)
sheet_ct <- wb$getNumberOfSheets()
dat <- rbindlist(pblapply(1:sheet_ct, function(x) {
res <- read.xlsx(xl_file, x)
}), fill=TRUE)
head(dat)
## EN MN ED HO TM SL PH DI TA DI.1 CH PI
## 1: # # day hour degC NA NA µmol/L µmol/kg µmol/L µg/L µmol/L
## 2: 1 1 1 12 9.9 31.23 7.82 2126.1575 2151 15.3 0.93 NA
## 3: 1 1 2 36 9.59 31.17 7.84 2120.4175 2150 14.2 1.2044 0.69
## 4: 1 1 3 60 9.65 31.13 7.84 2110.885 2143 14.3 0.9137 2.85
## 5: 1 1 4 84 10.36 31.16 7.83 2105.4525 2137 13.8 0.7189 7.29
## 6: 1 1 5 108 10.06 31.13 7.84 2106.4775 2139 13.7 0.317 5.24
## PO PN PP DC DN DP TP Exp.num Mesocosm
## 1: µmol/L µmol/L µmol/L µmol/L µmol/L µmol/L µmol/L NA NA
## 2: NA 2.319 0.032 100.4 NA NA 5.6306 NA NA
## 3: 24.16 2.598 0.048 104.5 NA NA 2.3034 NA NA
## 4: 34.815 2.095 0.059 NA NA NA 2.5594 NA NA
## 5: 40.999 2.186 0.056 97.5 NA NA 5.8865 NA NA
## 6: 37.751 2.173 0.081 NA NA NA 6.1425 NA NA
## Exp.day Hour Temperature Salinity pH DIC DIN Chl.a PIC POC PON POP DOC
## 1: NA NA NA NA NA NA NA NA NA NA NA NA NA
## 2: NA NA NA NA NA NA NA NA NA NA NA NA NA
## 3: NA NA NA NA NA NA NA NA NA NA NA NA NA
## 4: NA NA NA NA NA NA NA NA NA NA NA NA NA
## 5: NA NA NA NA NA NA NA NA NA NA NA NA NA
## 6: NA NA NA NA NA NA NA NA NA NA NA NA NA
## DON DOP TEP
## 1: NA NA NA
## 2: NA NA NA
## 3: NA NA NA
## 4: NA NA NA
## 5: NA NA NA
## 6: NA NA NA
An alternate approach would be to batch convert your Excel files to CSV, which may have more efficacy in the long run.
Upvotes: 2