Reputation: 314
When I'm using openxlsx
to save a dataframe, the resultant file ends up being damaged.
library(openxlsx)
# Making dummy dataframe
Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)
df <- data.frame(Name, Age)
# Creating output file
output_path <- "check_corruption.xlsx"
wb <- createWorkbook(output_path)
addWorksheet(wb, "Sheet1")
# Write unformatted data
writeData(wb, sheet=1, df)
# Save fully formatted output
saveWorkbook(wb, output_path, overwrite=TRUE)
When I try to open check_corruption.xlsx I get the error
"We found a problem with some content in 'check_corruption.xlsx'. Do you want us to try to recover as much as we can?".
If I repair the file then the output seems fine but I don't understand what's causing the problem in the first place.
Upvotes: 4
Views: 2760
Reputation: 453
I had a similar problem when retrieving data from a database. R handles it well, but writing in into Excel caused the workbook to not open.
I fixed it by removing control characters in the dataframe. Here's the code:
library(dplyr)
df <- df %>%
# All columns with strings...
mutate(across(where(is.character),
# Remove control characters
~ gsub("[[:cntrl:]]", "", .))
)
In the OP example, this should go after df <- data.frame(Name, Age)
.
Upvotes: 0
Reputation: 1351
I also got the same error when opening a workbook created in openxlsx
, but the cause was different and - after repairing - some data was actually missing in the first column.
After a good deal of troubleshooting it turned out that I applied a style to column "0":
openxlsx::addStyle(wb, sheet=1, style=sty3, rows=1, cols=c(0:34), gridExpand = TRUE, stack = FALSE)
That's a big no-no. Columns should always start above 0: cols=c(1:34)
.
Upvotes: 1
Reputation: 314
It turns out this was due to an "&" symbol in the output path of the file, which was not supported by openxlsx. I opened an issue here which has been addressed, so if you are having this problem update your installation of openxlsx to the main branch of github.
Upvotes: 1