Reputation: 41
While loading an MS Excel Workbook with Data Validation in read/write mode using openpyxl, the time consumed is much higher than loading a workbook without DV. The time consumed grows exponentially as the DV's are more and sometime i get MemoryError too.
Here is a sample that explains the time consumption in seconds:
Without Data validation - sample_nodv.xlsx
python -c "from time import time; from openpyxl.reader.excel import load_workbook; st = time(); load_workbook('sample_nodv.xlsx'); print(time()-st)"
Output: 0.004000425338745117
With Data validation - sample_dv.xlsx
python -c "from time import time; from openpyxl.reader.excel import load_workbook; st = time(); load_workbook('sample_dv.xlsx'); print(time()-st)"
Output: 4.132413148880005
I would like to use openpyxl and load the xlsx file in RW mode and at the same time reduce the time consumed. I don't want to use win32com as i want the solution to be independent of OS.
Note:
Upvotes: 1
Views: 980
Reputation: 19527
This is a known issue for validators that are applied to entire columns. See https://bitbucket.org/openpyxl/openpyxl/issues/837 for further information. openpyxl version 2.5b1 contains a fix for this.
Upvotes: 1