Reputation: 11
Longtime lurker, first time poster. This forum is an amazing resource but I'm officially stumped.
I have a large and complex xlsm file that I need to manipulate using openpyxl but my final file ends up corrupted every time. Essentially, the program is a file splitter. I am making copies of a master xlsm file using shutil.copy, using openpyxl to modify the file to include only User A's data, saving, and repeating for User B, C, etc.
The master file is 11MB, houses data in multiple Excel tables (not just a range of cells, those two things are quite different), uses a data model, and has pivot tables that auto-update upon opening the file, charts, and slicers that control the charts.
Shutil.copy works fine and I can open the file successfully. I have isolated the issue to opening the copied file using openpyxl, saving, and reopening (disregarding any formatting I need to do). This is when I get the corruption error. I have been successful using a sample file that uses all of the same features on a much smaller scale (135KB) but my file always ends up corrupted when I use the big 11MB file.
Current code snippet:
import openpyxl
#Identify source file and destination
file = "C:\\blah blah\\sourcefile_usera.xlsm"
wb = openpyxl.load_workbook(file, read_only=False, keep_vba=True)
wb.save(file)
wb.close()
Excel error message: "We found a problem with some content in 'sourcefile_usera.xlsm'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
Clicking Yes causes Excel to get stuck in a loop of trying to open the file and I have to force-close Excel through my task manager to get it to stop.
I welcome suggestions for how to address what I think is a file size issue or if anyone can tell me how to successfully write a pandas dataframe back into an existing Excel table while protecting everything else in the file (multiple hidden tabs, pivot tables, charts, data model, etc.).
What I've tried:
Upvotes: 0
Views: 640
Reputation: 20450
It sounds like you have N sheets within a workbook. And they contain some fairly challenging formatting details.
Simplify your problem:
Now you're in a better position to notice exactly where things fell apart. And you can conveniently truncate input files to identify the initial troublesome row, using a binary search approach.
Upvotes: 0