Reputation: 75
import openpyxl
infoFilePath='test.xlsx'
workbook = openpyxl.load_workbook(infoFilePath,read_only=True)
first_sheet = workbook.get_sheet_names()[0]
worksheet = workbook.get_sheet_by_name(first_sheet)
workbook.active
def iter_rows(ws):
for row in ws.iter_rows():
yield [cell.value for cell in row]
rows = list(iter_rows(worksheet))
print(rows)
As you see, I've success with the print 'rows' as a list. But how to print 'columns' as a list?? when I used ws.iter_cols(), error occurred.
Upvotes: 0
Views: 3182
Reputation: 22942
You are using a read-only workbook.
The problem is that Worksheet.iter_rows()
and Worksheet.iter_cols()
methods mutate the internal structure of the worksheet by dynamically creating the "missing" cells, using the Worksheet.cell()
.
In fact, a Worksheet
instance stores all non-empty cells in a dict
, where keys are (row_idx, col_idx)
tuples, and values are Cell
instances. If a cell is empty (no value, no style), it is not store in the dict
.
If you load your workbook with read_only = False
, you can iterate the worksheet without mutate it. You can access the _cells
protected attribute.
You can use this function to calculate the bounding box of cells in a worksheet:
def calculate_indexes(cells):
# Really efficient and low memory consuming (was profiled) algorithm
min_row_idx, min_col_idx = next(iter(cells), (1, 1))
max_row_idx, max_col_idx = min_row_idx, min_col_idx
for row_idx, col_idx in cells:
min_col_idx = min_col_idx if min_col_idx < col_idx else col_idx
min_row_idx = min_row_idx if min_row_idx < row_idx else row_idx
max_col_idx = max_col_idx if col_idx < max_col_idx else col_idx
max_row_idx = max_row_idx if row_idx < max_row_idx else row_idx
return min_col_idx, min_row_idx, max_col_idx, max_row_idx
Where cells is the list of cell coordinates:
You can then implement an iter_cols
like this:
def iter_cols(ws):
min_col_idx, min_row_idx, max_col_idx, max_row_idx = calculate_indexes(ws._cells.keys())
for col_idx in range(min_col_idx, max_col_idx + 1):
yield [(ws._cells[(row_idx, col_idx)] if (row_idx, col_idx) in ws._cells else None)
for row_idx in range(min_row_idx, max_row_idx + 1)]
for col in iter_cols(ws):
print(col)
Upvotes: 3
Reputation: 1113
openpyxl is a old module to handle old excel files(xlsx/xlsm/xltx/xltm), the iter_cols()
is not enabled in the file access mode your are currently using. you can eighter change the access mode read only. you can use print settings.
Or else you can use many other modules like: Here are some options to choose from:
Upvotes: 1