Reputation: 197
I am trying to fill some columns based on a formula. I wrote the formula, ran the code and saved the workbook by calling saveWorkbook() function. When I open the excel to have a look everything seems fine. But when I read the data from the same sheet I am getting NAs for the columns that I filled earlier with formula.
Here is a simple example. Initially I have two columns: a and b
read.xlsx("data/rough.xlsx", sheet = "Sheet1", rows = c(1:4), cols = c(2,3))
a b
1 1 NA
2 2 NA
3 3 NA
I loaded the workbook and did the required changes
wb <- loadWorkbook("data/rough.xlsx")
formula_vector <- c(paste0("B", seq(2,4), "*2"))
writeFormula(wb, sheet = "Sheet1", x = formula_vector, startCol = 3, startRow = 2)
saveWorkbook(wb, "data/rough.xlsx", overwrite = TRUE)
After saving the workbook I ran the same line and got the same output
read.xlsx("data/rough.xlsx", sheet = "Sheet1", rows = c(1:4), cols = c(2,3))
a b
1 1 NA
2 2 NA
3 3 NA
this is what I actually have inside the sheet
Now I that I manually opened the file it is asking me whether to save or not even though I didn't change anything. If I save it and the run the same line in R...I am getting the correct values.
after manually saving the file
read.xlsx("data/rough.xlsx", sheet = "Sheet1", rows = c(1:4), cols = c(2,3))
a b
1 1 2
2 2 4
3 3 6
How can I resolve this? It seems like even after running the saveWorkbook() R is still holding some details of that workbook.
Please let me know if you come across any solutions. Thank you!
Upvotes: 1
Views: 2443
Reputation: 197
I found a solution from this
After running the saveWorkbook() I ran the following chunk
#create an instance of excel
xlApp <- COMCreate("Excel.Application")
#open the same file that you saved earlier with saveWorkbook() method
xlWbk <- xlApp$Workbooks()$Open("complete path to excel file")
#save the file and close it
xlWbk$Save()
xlWbk$Close()
#quit excel instance
xlApp$Quit()
rm(xlWbk, xlApp)
You need to load RDCOMClient package before running the above chunk. After running this I got the updated values
read.xlsx("data/rough.xlsx", sheet = "Sheet1", rows = c(1:4), cols = c(2,3))
a b
1 1 2
2 2 4
3 3 6
Basically all the modifications are done in openxlsx and RDCOMClient can be used to open and save the file instead of doing it manually. Hope this helps.
PS: I don't know much about VBA scripting in R and am looking for some resources. If you know of any, please share. Thank you!
Upvotes: 0
Reputation: 969
The answer @stefan provided is the correct one. In addition to this, there are a few issues regarding this in the openxlsx
bug tracker, see here or here.
The problem basically boils down to how the xlsx file format works. If we write the formula SUM(A2:A3)
from openxlsx
we create a cell like this:
<c r="A1">
<f>SUM(A2:A3)</f>
</c>
Once the file is opened in spreadsheet software like Excel, LibreOffice or Numbers, the formula gets evaluated. If the sum of A2
and A3
is 2
, the cell will become something like this:
<c r="A1">
<f>SUM(A2:A3)</f>
<v>2</v>
</c>
This is why Excel asks you to save the file, because once the file was loaded, the value was updated.
If the file is saved, the value can be read in openxlsx
and other programs and packages that simply evaluate v
node of a cell (AFAIK all R libraries do so). The only way to circumvent this, would be to implement a library that evaluates the formula and calculates the value. Or to use a headless libreoffice to evaluate the formula for you and save their output.
Why isn't the formula evaluated in R? Well for SUM
this might be a simple task, but Excel has a few other functions that are custom to Excel and the implementation is not straight forward, even if you ignore things like different rounding or linked workbooks. Therefore I guess it is unlikely that someone will work on this.
On a side note, not even Excel evaluates formulas all the time. With linked workbooks, I've seen that the v
node was picked unless the linked workbook was open. And of course this quite dangerous. After all there is plenty of software around that can modify xlsx files, but unless you know that a file does not contain formulas, you can never know if the values of formulas in the spreadsheet are still valid. Because, in the example above, there is nothing stopping you from modifying A2
and A3
in a way that the sum would no longer be 2
.
Upvotes: 1