Reputation: 35
I know this in an active issue with readxl (here), but it hasn't been resolved, so I'm looking for a work-around that works for my specific situation, since it's a little different than most questions related to this problem.
I am working with raw data coming off a machine that takes a bunch of different measurements and puts each into a different excel sheets of the same document. So I am trying to read each sheet into a different tibble, name them according to the sheet name, and cbind them into a single table that can be written to excel.
My problem is that sometimes the machine can't calculate certain measurements, so that sheet contains blank excel cells. That gets turned into a blank (0 x 0) tibble that then doesn't get included in the final table. Can't have that.
Here's the code that produces the list of tibbles:
uncleparse<-function(filename, destfile){
library(readxl)
library(tibble)
library(writexl)
sheets<-readxl::excel_sheets(filename)
x<-lapply(sheets, function(X) readxl::read_excel(filename, sheet = X, range = "A3:B180"))
The result is a list of tibbles, not named yet, because some of them have dimensions of 0, 0 and most of 177, 2, so any time I try to do something with x, R complains that the elements don't all have the same dimensions.
What I've tried:
I've tried specifying na = ""
and na = " "
in the read_excel argument and it didn't error, but it also didn't change the blank tibbles into 177 x 2 tibbles of NA's, like I wanted.
I've tried making a tibble of NA's of the right dimensions and using an IF statement to replace the blank tibbles with the NA tibble:
n<-rep(NA, 177)
nt<-tibble::tibble(n, rows = rep(NA, 177)) #result is a 177 x 2 tibble of NA's
for (i in 1:96){ #there are always 96 sheets
if (any(nrow(x[i])==0)) {
x[i]<-nt
}
}
Same result--doesn't error, but also doesn't actually replace the blank tibbles.
I'm very new to R, so it's also possible these methods would work if I changed something simple. Appreciate any feedback you can give me.
Upvotes: 0
Views: 70
Reputation: 18742
Here is an option, the most important part is the use of tibble::add_row
which turns your data frame without any data into a one row data frame with all NA
values:
rio::import_list("Book1.xlsx") |>
purrr::modify_if(~ nrow(.x) == 0, ~ tibble::add_row(.x)) |>
dplyr::bind_rows(.id = "sheet_name")
I created a local workbook with three sheets (Sheet1 - Sheet3). Sheet2 has column names, but no values. Below I have the structure of rio::import_list
for use to create other solutions.
How it works
rio::import_list
will import all the sheets of an Excel workbook into a named list, with those list names coming from the sheet names.modify_if
looks for list elements that have no rows, if found it will create an empty row.bind_rows
will stack those tibbles and create a new column named sheet_name
that is the value of the list element name.If you read the documentation for ?rio::import_list
you will find additional information on how to control the data import.
Output
sheet_name A B C
1 Sheet1 1 2 3
2 Sheet1 1 2 3
3 Sheet1 1 2 3
4 Sheet2 NA NA NA
5 Sheet3 0 0 0
6 Sheet3 0 0 0
7 Sheet3 0 0 0
Data
Output from the rio::import_list
step:
list(Sheet1 = structure(list(A = c(1, 1, 1), B = c(2, 2, 2),
C = c(3, 3, 3)), class = "data.frame", row.names = c(NA,
3L)), Sheet2 = structure(list(A = logical(0), B = logical(0),
C = logical(0)), class = "data.frame", row.names = character(0)),
Sheet3 = structure(list(A = c(0, 0, 0), B = c(0, 0, 0), C = c(0,
0, 0)), class = "data.frame", row.names = c(NA, 3L)))
Upvotes: 0