Jason
Jason

Reputation: 75

How to make a list of columns in openpyxl in python

import openpyxl

infoFilePath='test.xlsx'

workbook = openpyxl.load_workbook(infoFilePath,read_only=True)
first_sheet = workbook.get_sheet_names()[0]
worksheet = workbook.get_sheet_by_name(first_sheet)
workbook.active

def iter_rows(ws):
    for row in ws.iter_rows():
        yield [cell.value for cell in row]

rows = list(iter_rows(worksheet))
print(rows)

As you see, I've success with the print 'rows' as a list. But how to print 'columns' as a list?? when I used ws.iter_cols(), error occurred.

Upvotes: 0

Views: 3182

Answers (2)

Laurent LAPORTE
Laurent LAPORTE

Reputation: 22942

You are using a read-only workbook.

The problem is that Worksheet.iter_rows() and Worksheet.iter_cols() methods mutate the internal structure of the worksheet by dynamically creating the "missing" cells, using the Worksheet.cell().

In fact, a Worksheet instance stores all non-empty cells in a dict, where keys are (row_idx, col_idx) tuples, and values are Cell instances. If a cell is empty (no value, no style), it is not store in the dict.

If you load your workbook with read_only = False, you can iterate the worksheet without mutate it. You can access the _cells protected attribute.

You can use this function to calculate the bounding box of cells in a worksheet:

def calculate_indexes(cells):
    # Really efficient and low memory consuming (was profiled) algorithm
    min_row_idx, min_col_idx = next(iter(cells), (1, 1))
    max_row_idx, max_col_idx = min_row_idx, min_col_idx
    for row_idx, col_idx in cells:
        min_col_idx = min_col_idx if min_col_idx < col_idx else col_idx
        min_row_idx = min_row_idx if min_row_idx < row_idx else row_idx
        max_col_idx = max_col_idx if col_idx < max_col_idx else col_idx
        max_row_idx = max_row_idx if row_idx < max_row_idx else row_idx
    return min_col_idx, min_row_idx, max_col_idx, max_row_idx

Where cells is the list of cell coordinates:

You can then implement an iter_cols like this:

def iter_cols(ws):

    min_col_idx, min_row_idx, max_col_idx, max_row_idx = calculate_indexes(ws._cells.keys())

    for col_idx in range(min_col_idx, max_col_idx + 1):
        yield [(ws._cells[(row_idx, col_idx)] if (row_idx, col_idx) in ws._cells else None)
               for row_idx in range(min_row_idx, max_row_idx + 1)]

for col in iter_cols(ws):
    print(col)

Upvotes: 3

learncode
learncode

Reputation: 1113

openpyxl is a old module to handle old excel files(xlsx/xlsm/xltx/xltm), the iter_cols() is not enabled in the file access mode your are currently using. you can eighter change the access mode read only. you can use print settings.

Or else you can use many other modules like: Here are some options to choose from:

Upvotes: 1

Related Questions