goblinshark
goblinshark

Reputation: 314

openxlsx writeData() and saveWorkbook() create corrupted Excel file

When I'm using openxlsx to save a dataframe, the resultant file ends up being damaged.

library(openxlsx)

# Making dummy dataframe
Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)

df <- data.frame(Name, Age)

# Creating output file 
output_path <- "check_corruption.xlsx"
wb <- createWorkbook(output_path) 
addWorksheet(wb, "Sheet1")

# Write unformatted data
writeData(wb, sheet=1, df)

# Save fully formatted output
saveWorkbook(wb, output_path, overwrite=TRUE)

When I try to open check_corruption.xlsx I get the error

"We found a problem with some content in 'check_corruption.xlsx'. Do you want us to try to recover as much as we can?".

If I repair the file then the output seems fine but I don't understand what's causing the problem in the first place.

Upvotes: 4

Views: 2760

Answers (3)

mkranj
mkranj

Reputation: 453

I had a similar problem when retrieving data from a database. R handles it well, but writing in into Excel caused the workbook to not open.

I fixed it by removing control characters in the dataframe. Here's the code:

library(dplyr)

df <- df %>%
    # All columns with strings...
    mutate(across(where(is.character),
    # Remove control characters
                  ~ gsub("[[:cntrl:]]", "", .))
           )

In the OP example, this should go after df <- data.frame(Name, Age).

Upvotes: 0

Michal J Figurski
Michal J Figurski

Reputation: 1351

I also got the same error when opening a workbook created in openxlsx, but the cause was different and - after repairing - some data was actually missing in the first column.

After a good deal of troubleshooting it turned out that I applied a style to column "0":

openxlsx::addStyle(wb, sheet=1, style=sty3, rows=1, cols=c(0:34), gridExpand = TRUE, stack = FALSE)

That's a big no-no. Columns should always start above 0: cols=c(1:34).

Upvotes: 1

goblinshark
goblinshark

Reputation: 314

It turns out this was due to an "&" symbol in the output path of the file, which was not supported by openxlsx. I opened an issue here which has been addressed, so if you are having this problem update your installation of openxlsx to the main branch of github.

Upvotes: 1

Related Questions