Reputation: 1902
I have about 1500 excel spreadsheet files downloaded from movescount.com - these contain heart rate data taken over a few years. I want to process these files with a bit of Python + openpyxl. The problem is that opening the excel file with openpyxl fails. Specifically it complains that a styles.xml definition is missing. Also Apple’s Numbers cannot open the file. When I open the file with MS Excel (which works) and simply save the file right away, I can open and process it with python and openpyxl just fine.
Clearly, MS Excel is repairing some format issue. But I would like to avoid having to open and save the file with MS Excel - particularly because that means I need to do this on an MS Windows box. What is the best way to repair these files on the fly on my OS X machine?
Upvotes: 2
Views: 1987
Reputation: 45542
An Excel xlsx file is a zipped archive of files. So you can write the missing file to the xlsx archive provided you have one good file to start with.
import zipfile
with zipfile.ZipFile('/path/to/goodfile.xlsx', 'r') as zgood:
styles_xml = zgood.read('xl/styles.xml')
with zipfile.ZipFile('/path/to/badfile.xlsx', 'a') as zbad:
zbad.writestr('xl/styles.xml', styles_xml)
Upvotes: 1