Pablo
Pablo

Reputation: 29519

How to modify Excel sheet without losing extensions?

I am trying to modify excel file, which has number of VBA actions(not created by me). I had gentle attempt to modify single combo box item.

from openpyxl import load_workbook

# Load the workbook
workbook = load_workbook('input.xlsx')

# Select the worksheet
worksheet = workbook['Monthly']

# Change the value of the cell C5
worksheet['C5'] = 9

# Save the workbook with a new name
workbook.save('output.xlsx')

These are the warnings I got:

UserWarning: Data Validation extension is not supported and will be removed warn(msg)

UserWarning: Conditional Formatting extension is not supported and will be removed warn(msg)

The output file size is much smaller and some functionality is gone, although the combo box value has been modified. My question is whether there is a library which will preserve the functionality of Validation / Conditional Formatting extension, while allowing me to modify the cell values?

Upvotes: 1

Views: 100

Answers (1)

Axel Richter
Axel Richter

Reputation: 61925

When attempting to create libraries that provide an application programming interface (API) to Microsoft Office files using the Office Open XML file format without requiring installed Microsoft Office applications, there are primarily two approaches.

The one is to take the ZIP archive of the Office Open XML file, get the XML parts out of it and manipulate this using XML methods. After manipulating, re-insert the manipulated XML into the source Office Open XML ZIP archive. Using this approach, the source Office Open XML file stays fully intact as it was, even if the programmed API currently not provides methods to manipulate all kinds of possible parts of the Office Open XML file format. But this approach is resource consuming as it needs to hold the source Office Open XML ZIP archive in memory and to program all methods using XML objects directly without having the possibility to de-serialize objects into binary objects first, then manipulate this binary objects and only serialize these to XML again while saving the file. On the contrary, binary objects must be created in addition to the XML objects.

The other approach is to take the ZIP archive of the Office Open XML file, get de-serialized binary objects only for the XML parts out of it, which the API currently provides methods for and work using this binary objects further. After all, while saving the file, the binary objects get serialized to XML and put into a new version of the Office Open XML ZIP archive. This method is less resource consuming as there is no need to hold the source Office Open XML ZIP archive in memory and the manipulating works using memory saving binary objects. But, of course, this method not retains the source Office Open XML file.

OpenPyXL uses the second approach. Thus the resulting Office Open XML Excel-file only contains parts, the OpenPyXL API provides methods for. That's why the warnings. And that's why OpenPyXL only will fully retain the source Office Open XML Excel-file, when it in future will fully providing manipulating all parts of it. This is not yet the case.

The APIs of python-docx and python-pptx for example follow the first approach. That's why using these APIs to manipulate Office Open XML Word-files and/or PowerPoint-files fully retain the source files. But these APIs are more resource consuming.

In short, I doubt that you will be able to fully retain the source Office Open XML Excel-file after manipulating that using OpenPyXL.

Upvotes: 1

Related Questions