Reputation: 13354
I am reading in excel files in R using the readxl package like this:
library(readxl)
file_names <- list.files(pattern = ".xlsx")
list_collection <- list()
for(i in 1:length(file_names)){
frame <- read_excel(file_names[i], )
frame_sub <- frame[1:100,]
list_collection[i] <- list(frame_sub)
}
Since there are a lot of excel files, and I only want the first 100 rows. Obviously this is not efficient. Is there a way to read in only 100 rows from excel initially, instead of reading in this whole file and THEN subsetting?
Upvotes: 2
Views: 7335
Reputation: 1359
Came across this while searching for the same thing - this has been added to the latest readxl
update. To get the first 100 rows in your sheet you could do
library(readxl)
file_names <- list.files(pattern = ".xlsx")
list_collection <- list()
for(i in 1:length(file_names)){
frame <- read_excel(file_names[i], n_max = 100)
list_collection[i] <- list(frame)
}
Or you could specify particular rows using something like
frame <- read_excel(file_names[i], range = cell_rows(1:100))
Upvotes: 3
Reputation: 99371
Try xlsx::read.xlsx()
instead. It has arguments for specifying the start and end rows. Also note that I have made some improvements to your for()
loop (memory allocation being the most important).
library(xlsx)
## get file names
file_names <- list.files(pattern = "\\.xlsx$")
## allocate memory for our list
out <- vector("list", length(file_names))
## read the files and assign them to the list
for(i in seq_along(file_names)) {
out[[i]] <- read.xlsx(file_names[i], startRow = 1, endRow = 100)
}
Or you could do a named list by changing the for()
loop to
for(file in file_names) {
out[[file]] <- read.xlsx(file, startRow = 1, endRow = 100)
}
Upvotes: 3