kswp
kswp

Reputation: 35

Readxl makes empty cells into blank tibbles--work around by replacing with NAs

I know this in an active issue with readxl (here), but it hasn't been resolved, so I'm looking for a work-around that works for my specific situation, since it's a little different than most questions related to this problem.

I am working with raw data coming off a machine that takes a bunch of different measurements and puts each into a different excel sheets of the same document. So I am trying to read each sheet into a different tibble, name them according to the sheet name, and cbind them into a single table that can be written to excel.

My problem is that sometimes the machine can't calculate certain measurements, so that sheet contains blank excel cells. That gets turned into a blank (0 x 0) tibble that then doesn't get included in the final table. Can't have that.

Here's the code that produces the list of tibbles:

uncleparse<-function(filename, destfile){
  library(readxl)
  library(tibble)
  library(writexl)
  sheets<-readxl::excel_sheets(filename)
  x<-lapply(sheets, function(X) readxl::read_excel(filename, sheet = X, range = "A3:B180"))

The result is a list of tibbles, not named yet, because some of them have dimensions of 0, 0 and most of 177, 2, so any time I try to do something with x, R complains that the elements don't all have the same dimensions.

What I've tried:

  1. I've tried specifying na = "" and na = " " in the read_excel argument and it didn't error, but it also didn't change the blank tibbles into 177 x 2 tibbles of NA's, like I wanted.

  2. I've tried making a tibble of NA's of the right dimensions and using an IF statement to replace the blank tibbles with the NA tibble:

n<-rep(NA, 177)
nt<-tibble::tibble(n, rows = rep(NA, 177)) #result is a 177 x 2 tibble of NA's
for (i in 1:96){ #there are always 96 sheets
  if (any(nrow(x[i])==0)) {
    x[i]<-nt
    }
  }

Same result--doesn't error, but also doesn't actually replace the blank tibbles.

I'm very new to R, so it's also possible these methods would work if I changed something simple. Appreciate any feedback you can give me.

Upvotes: 0

Views: 70

Answers (1)

LMc
LMc

Reputation: 18742

Here is an option, the most important part is the use of tibble::add_row which turns your data frame without any data into a one row data frame with all NA values:

rio::import_list("Book1.xlsx") |>
  purrr::modify_if(~ nrow(.x) == 0, ~ tibble::add_row(.x)) |>
  dplyr::bind_rows(.id = "sheet_name")

I created a local workbook with three sheets (Sheet1 - Sheet3). Sheet2 has column names, but no values. Below I have the structure of rio::import_list for use to create other solutions.

How it works

  1. rio::import_list will import all the sheets of an Excel workbook into a named list, with those list names coming from the sheet names.
  2. modify_if looks for list elements that have no rows, if found it will create an empty row.
  3. Lastly, bind_rows will stack those tibbles and create a new column named sheet_name that is the value of the list element name.

If you read the documentation for ?rio::import_list you will find additional information on how to control the data import.

Output

  sheet_name  A  B  C
1     Sheet1  1  2  3
2     Sheet1  1  2  3
3     Sheet1  1  2  3
4     Sheet2 NA NA NA
5     Sheet3  0  0  0
6     Sheet3  0  0  0
7     Sheet3  0  0  0

Data

Output from the rio::import_list step:

list(Sheet1 = structure(list(A = c(1, 1, 1), B = c(2, 2, 2), 
    C = c(3, 3, 3)), class = "data.frame", row.names = c(NA, 
3L)), Sheet2 = structure(list(A = logical(0), B = logical(0), 
    C = logical(0)), class = "data.frame", row.names = character(0)), 
    Sheet3 = structure(list(A = c(0, 0, 0), B = c(0, 0, 0), C = c(0, 
    0, 0)), class = "data.frame", row.names = c(NA, 3L)))

Upvotes: 0

Related Questions