Cybernetic
Cybernetic

Reputation: 13354

Read limited number of rows from excel file in R with read_excel

I am reading in excel files in R using the readxl package like this:

library(readxl)
file_names <- list.files(pattern = ".xlsx")

list_collection <- list() 
for(i in 1:length(file_names)){
  frame <- read_excel(file_names[i], )
  frame_sub <- frame[1:100,]
  list_collection[i] <- list(frame_sub)  
}

Since there are a lot of excel files, and I only want the first 100 rows. Obviously this is not efficient. Is there a way to read in only 100 rows from excel initially, instead of reading in this whole file and THEN subsetting?

Upvotes: 2

Views: 7335

Answers (2)

MeetMrMet
MeetMrMet

Reputation: 1359

Came across this while searching for the same thing - this has been added to the latest readxl update. To get the first 100 rows in your sheet you could do

library(readxl)
file_names <- list.files(pattern = ".xlsx")

list_collection <- list() 
for(i in 1:length(file_names)){
  frame <- read_excel(file_names[i], n_max = 100)
  list_collection[i] <- list(frame)  
}

Or you could specify particular rows using something like

frame <- read_excel(file_names[i], range = cell_rows(1:100))

Upvotes: 3

Rich Scriven
Rich Scriven

Reputation: 99371

Try xlsx::read.xlsx() instead. It has arguments for specifying the start and end rows. Also note that I have made some improvements to your for() loop (memory allocation being the most important).

library(xlsx)
## get file names
file_names <- list.files(pattern = "\\.xlsx$")
## allocate memory for our list
out <- vector("list", length(file_names)) 
## read the files and assign them to the list
for(i in seq_along(file_names)) {
    out[[i]] <- read.xlsx(file_names[i], startRow = 1, endRow = 100)  
}

Or you could do a named list by changing the for() loop to

for(file in file_names) {
    out[[file]] <- read.xlsx(file, startRow = 1, endRow = 100)  
}

Upvotes: 3

Related Questions