Openpyxl. Ignoring cells that have style but have no values

Question

I'm wondering is there a way to make openpyxl to ignore cells without a values? Cells are in xml tree, but have no values, only styles defined

The problem is, I have a sheet, which xml structure is like this:

When openpyxl gets to the row 18 and tries to get cells, the way it generates cells make it to fill the gap between cells I18 and XFD18 with 16336 Empty cells. And same goes for rows 18 - 1048576, I'm getting a million of empty rows because of the way rows in XML tree are being parsed in ws.iter_rows method. I see logic beihind it and I understand that it's a problem of xlsx file itself, but I want to know whether it's something I can deal with by using openpyxl API?

I know that xlrd ignores empty cells formatting (it's just not implemented) and thus xlrd won't add such cells to the row. Is there a way to make openpyxl act the same? Looking at source code, I see no way of doing that using existing API, except from defining max_column and max_row for ws.iter_rows method (which is not optimal, since actual number of columns/rows may vary)

I managed to accomplish the task modifying the xml_source like that (wb is opened as read_only):

import io
import xml.etree.ElementTree as ET

from openpyxl.xml.functions import iterparse, safe_iterator
from openpyxl.xml.constants import SHEET_MAIN_NS


ROW_TAG = '{%s}row' % SHEET_MAIN_NS
CELL_TAG = '{%s}c' % SHEET_MAIN_NS
VALUE_TAG = '{%s}v' % SHEET_MAIN_NS
SHEET_DATA = '{%s}sheetData' % SHEET_MAIN_NS

def clean_up_xml_from_empty_rows_and_cells(sheet):
    sheet_data = None

    rows_to_delete = set()
    p = iterparse(sheet.xml_source, tag=[ROW_TAG], remove_blank_text=True)
    for _event, element in p:

        if element.tag == SHEET_DATA:
            sheet_data = element
            continue

        if element.tag == ROW_TAG:
            cells_to_delete = set()
            for cell in safe_iterator(element, CELL_TAG):
                if cell.findtext(VALUE_TAG) is None:
                    cells_to_delete.add(cell)

            for cell in cells_to_delete:
                element.remove(cell)

            if not len(element):
                rows_to_delete.add(element)

    for row in rows_to_delete:
        sheet_data.remove(row)

    out_file = io.BytesIO()

    element_tree = ET.ElementTree(p.root)
    element_tree.write(out_file, encoding='utf8')
    out_file.seek(0)

    sheet.xml_source = out_file

But it feels so wrong! And so dangerous, I'm absolutely unsure the function like that won't break the parsing of other xlsx files.

Is there really no way to ignore cells without a values? Browsing SO and google - it feels like noone ever encountered such problems :(

Charlie Clark · Accepted Answer

You're probably best off patching parse_row() which is where this stuff is controlled. This is easier to work with and thest than messing with the parser.

Openpyxl. Ignoring cells that have style but have no values

Answers (1)

Related Questions