Reputation: 19675
Openpyxl can tell me the max_row
and max_col
, the "Used Range" of an Excel sheet. However, this range can include cells with no content, if they were formerly selected or altered.
I want to know the last column and last row that have content.
For example, if -
here represents blanks in Used Range and _
means blanks outside Used Range, I want to choose the column marked with b
and the row marked with c
, even though Openpyxl and would, in calculating max_row
and max_col
, include the rows/columns with the dashes.
aaaaa---__
aaaaa-b-__
aaaaa---__
--------__
--c-----__
--------__
__________
__________
Upvotes: 4
Views: 11661
Reputation: 31354
I found that openpyxl does report the correct values for max_row and max_col for files that have been saved, but if you manipulate the contents of a sheet and need these values before saving, the problem still holds.
There is no built-in way of doing this, so your best option is to search the rows and columns yourself, preferably limiting your search by starting at the reported values and searching up and to the left.
The worksheet object allows you to access rows individually, but individual columns can only be accessed through .itercols()
. Whether this is faster that scanning all columns in one loop will depend on how empty you expect the sheet to be.
from openpyxl import load_workbook
wb = load_workbook('test.xlsx')
wb.worksheets[0]['h6'] = None
print((wb.worksheets[0].max_row, wb.worksheets[0].max_column))
def find_edges(sheet):
row = sheet.max_row
while row > 0:
cells = sheet[row]
if all([cell.value is None for cell in cells]):
row -= 1
else:
break
if row == 0:
return 0, 0
column = sheet.max_column
while column > 0:
cells = next(sheet.iter_cols(min_col=column, max_col=column, max_row=row))
if all([cell.value is None for cell in cells]):
column -= 1
else:
break
return row, column
print(find_edges(wb.worksheets[0]))
In this example, I load an Excel sheet that has exactly the data you suggested, with a value also still in H6
, which is removed on line 3.
It first prints the max_row
and max_column
as reported by openpyxl
and then calls find_edges
with the sheet, to find the actual values required.
For large sheets with very little data, you may want to experiment for speed with replacing the column scan by simply iterating over all columns, once you'd determined the last row (to limit size), like this:
columns = sheet.iter_cols(max_row=row)
column = 1
ci = 1
while True:
try:
cells = next(columns)
if not all([cell.value is None for cell in cells]):
column = ci
ci += 1
except StopIteration:
break
But I would expect that the first way is fastest for most useful use cases.
If you prefer short over readable:
def find_edges2(sheet):
def row():
for r in range(sheet.max_row, 0, -1):
if not all([cell.value is None for cell in sheet[r]]):
return r
row = row()
if not row:
return 0, 0
def column():
for c in range(sheet.max_column, 0, -1):
if not all([cell.value is None for cell in next(sheet.iter_cols(min_col=c, max_col=c, max_row=row))]):
return c
return row, column()
Upvotes: 10