Reputation: 1294
I am reading multiple Excel workbooks which have different ranges to be read, and potentially live in different sheets inside each workbook. I am using a master file which contains the file name, the name I want to call the data, the range to be read, and the sheet (if it isn't sheet 1). This is my master file:
Files = structure(list(file = c("Alaska.xls", "Analysis of Y-chromosome STRs in Chile.xlsx",
"Bolivia.xlsx", "carribean.xlsx", "Chachapoya.xlsx", "Colombian.XLSX",
"ndigenous Maya population from Guatemala.xlsx", "Nicaragua Nunez.xls",
"Nicaragua.xls", "Palha Brazil.xls", "Patagonia.xls", "Promega Y23 Haplotypes Jun2019.xlsx",
"Roewer et al.XLS", "Rio de Janeiro.xls", "The geographic mosaic of Ecuadorian.xlsx",
"Xu2015Data-original.xlsx"), name = c("Alaska", "Chile", "Bolivia",
"Carribean", "Chachapoya", "Colombian", "Guatemala", "Nicaragua",
"Nicaragua", "Palha", "Patagonia", "Promega", "Roewer", "Rio",
"Ecuador", "Xu"), range = c("G3:X31", "E3:U981", "I4:X230", "C4:S611",
"C2:Y185", "I3:Q80", "D1:S101", "B1:R165", "AQ2:BF167", "G2:AB2534",
"B8:J108", "C2:AT226", "J1:Y1012", "B3:Q608", "G4:AB419", "C2:S981"
), sheet = c("Table S8 Y chromosome STRs", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "")), class = "data.frame", row.names = c(NA,
-16L))
And it looks like this:
> Files
file name range sheet
1 Alaska.xls Alaska G3:X31 Table S8 Y chromosome STRs
2 Analysis of Y-chromosome STRs in Chile.xlsx Chile E3:U981
3 Bolivia.xlsx Bolivia I4:X230
4 carribean.xlsx Carribean C4:S611
5 Chachapoya.xlsx Chachapoya C2:Y185
6 Colombian.XLSX Colombian I3:Q80
7 ndigenous Maya population from Guatemala.xlsx Guatemala D1:S101
8 Nicaragua Nunez.xls Nicaragua B1:R165
9 Nicaragua.xls Nicaragua AQ2:BF167
10 Palha Brazil.xls Palha G2:AB2534
11 Patagonia.xls Patagonia B8:J108
12 Promega Y23 Haplotypes Jun2019.xlsx Promega C2:AT226
13 Roewer et al.XLS Roewer J1:Y1012
14 Rio de Janeiro.xls Rio B3:Q608
15 The geographic mosaic of Ecuadorian.xlsx Ecuador G4:AB419
16 Xu2015Data-original.xlsx Xu C2:S981
I would like to iterate over each row of this data frame, and use read_excel
to read in the file, and store the returned data frame in a list with its name set to name
.
I tried using just apply
which is ugly and doesn't work:
readFiles = function(){
Files = read.csv(system.file("extdata", "files.csv", package = "purps"))
Sheets = vector(mode = "list", length = length(Files$File))
names(Sheets) = Files$Name
readFile = function(row){
row = as.list(row)
path = system.file("extdata", file, package = "purps")
read_excel(path, range = row$range, sheet = ifelse(row$sheet == "", NULL, row$sheet))
}
Sheets = apply(Files, 1, readFile)
return(Sheets)
}
> readFiles()
Error in file.path(packagePath, ...) :
cannot coerce type 'closure' to vector of type 'character'
I am sure there is an elegant solution using purrr
or something else I don't know about! I am also sure I could just do this with a loop, but there has to be a more compact way.
Upvotes: 0
Views: 27
Reputation: 388817
You could try to split every row in Files
as list of dataframes and then pass it to readFiles
function.
readFiles = function(row){
path = system.file("extdata", file, package = "purps")
data <- readxl::read_excel(path, range = row$range,
sheet = ifelse(row$sheet == "", NULL, row$sheet))
return(data)
}
list_data <- lapply(split(Files, seq(nrow(Files))), readFiles)
To name the list you could do :
names(list_data) <- Files$name
Upvotes: 1