How to make a list of columns in openpyxl in python

Question

import openpyxl

infoFilePath='test.xlsx'

workbook = openpyxl.load_workbook(infoFilePath,read_only=True)
first_sheet = workbook.get_sheet_names()[0]
worksheet = workbook.get_sheet_by_name(first_sheet)
workbook.active

def iter_rows(ws):
    for row in ws.iter_rows():
        yield [cell.value for cell in row]

rows = list(iter_rows(worksheet))
print(rows)

As you see, I've success with the print 'rows' as a list. But how to print 'columns' as a list?? when I used ws.iter_cols(), error occurred.

Laurent LAPORTE · Accepted Answer

You are using a read-only workbook.

The problem is that Worksheet.iter_rows() and Worksheet.iter_cols() methods mutate the internal structure of the worksheet by dynamically creating the "missing" cells, using the Worksheet.cell().

In fact, a Worksheet instance stores all non-empty cells in a dict, where keys are (row_idx, col_idx) tuples, and values are Cell instances. If a cell is empty (no value, no style), it is not store in the dict.

If you load your workbook with read_only = False, you can iterate the worksheet without mutate it. You can access the _cells protected attribute.

You can use this function to calculate the bounding box of cells in a worksheet:

def calculate_indexes(cells):
    # Really efficient and low memory consuming (was profiled) algorithm
    min_row_idx, min_col_idx = next(iter(cells), (1, 1))
    max_row_idx, max_col_idx = min_row_idx, min_col_idx
    for row_idx, col_idx in cells:
        min_col_idx = min_col_idx if min_col_idx < col_idx else col_idx
        min_row_idx = min_row_idx if min_row_idx < row_idx else row_idx
        max_col_idx = max_col_idx if col_idx < max_col_idx else col_idx
        max_row_idx = max_row_idx if row_idx < max_row_idx else row_idx
    return min_col_idx, min_row_idx, max_col_idx, max_row_idx

Where cells is the list of cell coordinates:

You can then implement an iter_cols like this:

def iter_cols(ws):

    min_col_idx, min_row_idx, max_col_idx, max_row_idx = calculate_indexes(ws._cells.keys())

    for col_idx in range(min_col_idx, max_col_idx + 1):
        yield [(ws._cells[(row_idx, col_idx)] if (row_idx, col_idx) in ws._cells else None)
               for row_idx in range(min_row_idx, max_row_idx + 1)]

for col in iter_cols(ws):
    print(col)

How to make a list of columns in openpyxl in python

Answers (2)

Related Questions