seymourgoestohollywood
seymourgoestohollywood

Reputation: 1167

MemoryError Using openpyxl to write 500k+ rows

I have a script that opens a template xlsx file using openpyxl, and then makes across each of six sheets, adds in some data from lists produced earlier in the script and changes the format of the cells.

The issue I have is that there will be instances where on one sheet, I need to write 9 columns and 500k+ rows, and this is giving me a MemoryError.

Traceback (most recent call last):
  File "C:\python27\labs\labs\sqrdist\new_main_ui.py", line 667, in request_and_send_reports
    x = sqr_pull.main()
  File "C:\Python27\lib\site-packages\memory_profiler-0.32-py2.7.egg\memory_profiler.py", line 801, in wrapper
    val = prof(func)(*args, **kwargs)
  File "C:\Python27\lib\site-packages\memory_profiler-0.32-py2.7.egg\memory_profiler.py", line 445, in f
    result = func(*args, **kwds)
  File "C:\python27\labs\labs\sqrdist\sqr_pull.py", line 327, in main
    os.remove(temp_attach_filepath)
  File "build\bdist.win32\egg\openpyxl\workbook\workbook.py", line 281, in save
  File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 214, in save_workbook
  File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 197, in save
  File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 109, in write_data
  File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 134, in _write_worksheets
  File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 281, in write_worksheet
  File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 381, in write_worksheet_data
  File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 404, in write_cell
  File "build\bdist.win32\egg\openpyxl\xml\functions.py", line 142, in start_tag
  File "C:\Python27\lib\xml\sax\saxutils.py", line 159, in startElement
    self._write(u' %s=%s' % (name, quoteattr(value)))
  File "C:\Python27\lib\xml\sax\saxutils.py", line 104, in write
    self.flush()
MemoryError

The code that I think is causing this is the following, where KeywordReport is a list of lists.

ws_keywords = wb.get_sheet_by_name("E_KWs")
for r, row in enumerate(KeywordReport, start=1):
    for c, val in enumerate(row, start=1):
        mycell = ws_keywords.cell(row=r, column=c)
        mycell.value = val
        mycell.style = Style(border=thin_border)

ws_keywords.column_dimensions['A'].width = 60.0
ws_keywords.column_dimensions['B'].width = 50.0
ws_keywords.column_dimensions['C'].width = 50.0
ws_keywords.column_dimensions['D'].width = 15.0
ws_keywords.column_dimensions['E'].width = 16.0
ws_keywords.column_dimensions['F'].width = 16.0
ws_keywords.column_dimensions['G'].width = 16.0

for ref in ['A1','B1','C1','D1','E1','F1','G1']:
    cell = ws_keywords.cell(ref)
    cell.style = Style(font=Font(bold=True),fill=PatternFill(patternType='solid', fgColor=Color('ffd156')), border=thin_border)

gc.collect()
del KeywordReport[:]
gc.collect()

print "start of save"
wb.save(attach_filepath)
gc.collect()

os.remove(temp_attach_filepath)
QCoreApplication.processEvents()

I've looked at http://openpyxl.readthedocs.org/en/latest/optimized.html however I don't think I can use this to write without just dumping into a new workbook, but I need the data in my existing template.

Is there a way around this?

Upvotes: 2

Views: 4289

Answers (1)

Charlie Clark
Charlie Clark

Reputation: 19497

500k rows shouldn't really be too much of a problem. But I guess it also depends how many worksheets you have. How much memory do you have on the system?

Installing lxml will be faster (as will creating any styles outside of loops) but I wouldn't expect it to bring memory use down much.

If you do need to copy in data from an existing workbook you might want to think about using a separate workbook for the changes which would allow you to reduce memory use both reading and writing. Further discussion is probably best on the mailing list.

Upvotes: 3

Related Questions