Reputation: 1167
I have a script that opens a template xlsx file using openpyxl, and then makes across each of six sheets, adds in some data from lists produced earlier in the script and changes the format of the cells.
The issue I have is that there will be instances where on one sheet, I need to write 9 columns and 500k+ rows, and this is giving me a MemoryError
.
Traceback (most recent call last):
File "C:\python27\labs\labs\sqrdist\new_main_ui.py", line 667, in request_and_send_reports
x = sqr_pull.main()
File "C:\Python27\lib\site-packages\memory_profiler-0.32-py2.7.egg\memory_profiler.py", line 801, in wrapper
val = prof(func)(*args, **kwargs)
File "C:\Python27\lib\site-packages\memory_profiler-0.32-py2.7.egg\memory_profiler.py", line 445, in f
result = func(*args, **kwds)
File "C:\python27\labs\labs\sqrdist\sqr_pull.py", line 327, in main
os.remove(temp_attach_filepath)
File "build\bdist.win32\egg\openpyxl\workbook\workbook.py", line 281, in save
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 214, in save_workbook
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 197, in save
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 109, in write_data
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 134, in _write_worksheets
File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 281, in write_worksheet
File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 381, in write_worksheet_data
File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 404, in write_cell
File "build\bdist.win32\egg\openpyxl\xml\functions.py", line 142, in start_tag
File "C:\Python27\lib\xml\sax\saxutils.py", line 159, in startElement
self._write(u' %s=%s' % (name, quoteattr(value)))
File "C:\Python27\lib\xml\sax\saxutils.py", line 104, in write
self.flush()
MemoryError
The code that I think is causing this is the following, where KeywordReport
is a list of lists.
ws_keywords = wb.get_sheet_by_name("E_KWs")
for r, row in enumerate(KeywordReport, start=1):
for c, val in enumerate(row, start=1):
mycell = ws_keywords.cell(row=r, column=c)
mycell.value = val
mycell.style = Style(border=thin_border)
ws_keywords.column_dimensions['A'].width = 60.0
ws_keywords.column_dimensions['B'].width = 50.0
ws_keywords.column_dimensions['C'].width = 50.0
ws_keywords.column_dimensions['D'].width = 15.0
ws_keywords.column_dimensions['E'].width = 16.0
ws_keywords.column_dimensions['F'].width = 16.0
ws_keywords.column_dimensions['G'].width = 16.0
for ref in ['A1','B1','C1','D1','E1','F1','G1']:
cell = ws_keywords.cell(ref)
cell.style = Style(font=Font(bold=True),fill=PatternFill(patternType='solid', fgColor=Color('ffd156')), border=thin_border)
gc.collect()
del KeywordReport[:]
gc.collect()
print "start of save"
wb.save(attach_filepath)
gc.collect()
os.remove(temp_attach_filepath)
QCoreApplication.processEvents()
I've looked at http://openpyxl.readthedocs.org/en/latest/optimized.html however I don't think I can use this to write without just dumping into a new workbook, but I need the data in my existing template.
Is there a way around this?
Upvotes: 2
Views: 4289
Reputation: 19497
500k rows shouldn't really be too much of a problem. But I guess it also depends how many worksheets you have. How much memory do you have on the system?
Installing lxml will be faster (as will creating any styles outside of loops) but I wouldn't expect it to bring memory use down much.
If you do need to copy in data from an existing workbook you might want to think about using a separate workbook for the changes which would allow you to reduce memory use both reading and writing. Further discussion is probably best on the mailing list.
Upvotes: 3