wernor
wernor

Reputation: 421

R openxlsx writeFormula gives a corrupted file

I am trying to overwrite an Excel file with some new data.

My new data looks like this

new_data <- structure(list(date=c(1:30),var1 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), var2 = c("=B13-B12", 
"=B14-B13", "=B15-B14", "=B16-B15", "=B17-B16", "=B18-B17", "=B19-B18", 
"=B20-B19", "=B21-B20", "=B22-B21", "=B23-B22", "=B24-B23", "=B25-B24", 
"=B26-B25", "=B27-B26", "=B28-B27", "=B29-B28", "=B30-B29", "=B31-B30", 
"=B32-B31", "=B33-B32", "=B34-B33", "=B35-B34", "=B36-B35", "=B37-B36", 
"=B38-B37", "=B39-B38", "=B40-B39", "=B41-B40", "=B42-B41")), row.names = c(NA, 
-30L), class = c("tbl_df", "tbl", "data.frame"))

My dataset has more columns but those are the ones I identified as the ones messing up the file, specifically the var2 one.

Then I just turn my data from wide to long so I can group the data and write all columns with group_map.

update <- new_data %>%
                mutate_if(is.numeric,as.character) %>%
                pivot_longer(-date,
                             names_to = "var",
                             values_to = "value") %>% 
                select(var,value) %>% 
                mutate(col=c(1:2))

(NAs are actually in a numeric column, that is why I need to change numeric to character -plus I have more numeric columns within my dataset)

This is how I am writing my data into the excel file.

library(openxlsx)
wb <- loadWorkbook(paste(path,"file.xlsx",sep='/'))

update %>% 
  group_by(var) %>% 
  group_map(~writeFormula(wb=wb, sheet ="FirstSheet", x=.x$value,
                          startCol=.x$col,startRow=13))
saveWorkbook(wb,paste(path,"new_file.xlsx",sep='/'),overwrite = TRUE)

I am not sure what I am doing wrong but I guess it is due to the NA values since I have other columns coded the exact same way and they seem to be working just fine (with some I get a warning saying that the formula differs from the ones around them but I can live with that).

How should I manage this NA values? (they need to remain as missing values, for example a zero would be incorrect)

I tried changing them to empty strings (new_data[is.na(new_data)]<-"") before writing them into the file but the problem is still there.

Note: The message I get when I try to open the file is that it needs to be repaired. If I do repair it, it looks fine but I cannot reopen it again once closed.

Upvotes: 1

Views: 1201

Answers (1)

gaut
gaut

Reputation: 5958

The below works for me as expected: non-missing formulas are written, NA's are left blank. No corrupted message from Excel.

update_df <- new_data %>%
  mutate_if(is.numeric,as.character) %>%
  pivot_longer(-date,
               names_to = "var",
               values_to = "value") %>% 
  select(var,value) %>% 
  mutate(col=3)



library(openxlsx)
wb <- loadWorkbook(paste("file.xlsx"))
  # group_by(var) %>% # filter(is.na(value)) %>% 
  purrr::map(updatedf,.f= ~ writeFormula(wb, x= updatedf$value, sheet=1, startCol = c(1), startRow = 1))
saveWorkbook(wb,"new_file.xlsx",overwrite = TRUE)

enter image description here

Check if:

  • Is file.xlsx corrupted? What happens if you load it and write it without any modifications, can you open file.xlsx and new_file.xlsx then without a warning?
  • Can you reproduce the above solution by using a blank file.xlsx? If you still see corruption in new_file.xlsx after starting from a blank file.xlsx, this would indicate an issue with either excel or openxlsx.
  • What version of Excel, R and openxlsx are you running?

My config:

R version 4.0.4 (2021-02-15)
openxlsx_4.2.5
Microsoft® Excel® 2019 MSO (Version 2207 Build 16.0.15427.20166) 32-bit

EDIT After discussion, the problem might come from overwriting existing formulas.

  • Try to write the same formulas into other cells in your excel file where you're sure they won't interfere
  • Try deleting existing content before writing with something like:

deleteData(wb, sheet = 1,cols = unique(update_df$col), rows = 1:60, gridExpand = TRUE)

Upvotes: 1

Related Questions