Joshua Fox
Joshua Fox

Reputation: 19675

How do I find the last nonblank cell in Openpyxl?

Openpyxl can tell me the max_row and max_col, the "Used Range" of an Excel sheet. However, this range can include cells with no content, if they were formerly selected or altered.

I want to know the last column and last row that have content.

(Discussion for VBA, here.)

For example, if - here represents blanks in Used Range and _ means blanks outside Used Range, I want to choose the column marked with b and the row marked with c, even though Openpyxl and would, in calculating max_row and max_col, include the rows/columns with the dashes.

aaaaa---__
aaaaa-b-__
aaaaa---__
--------__
--c-----__
--------__
__________
__________

Upvotes: 4

Views: 11661

Answers (1)

Grismar
Grismar

Reputation: 31354

I found that openpyxl does report the correct values for max_row and max_col for files that have been saved, but if you manipulate the contents of a sheet and need these values before saving, the problem still holds.

There is no built-in way of doing this, so your best option is to search the rows and columns yourself, preferably limiting your search by starting at the reported values and searching up and to the left.

The worksheet object allows you to access rows individually, but individual columns can only be accessed through .itercols(). Whether this is faster that scanning all columns in one loop will depend on how empty you expect the sheet to be.

from openpyxl import load_workbook
wb = load_workbook('test.xlsx')
wb.worksheets[0]['h6'] = None

print((wb.worksheets[0].max_row, wb.worksheets[0].max_column))

def find_edges(sheet):
    row = sheet.max_row
    while row > 0:
        cells = sheet[row]
        if all([cell.value is None for cell in cells]):
            row -= 1
        else:
            break
    if row == 0:
        return 0, 0

    column = sheet.max_column
    while column > 0:
        cells = next(sheet.iter_cols(min_col=column, max_col=column, max_row=row))
        if all([cell.value is None for cell in cells]):
            column -= 1
        else:
            break
    return row, column

print(find_edges(wb.worksheets[0]))

In this example, I load an Excel sheet that has exactly the data you suggested, with a value also still in H6, which is removed on line 3.

It first prints the max_row and max_column as reported by openpyxl and then calls find_edges with the sheet, to find the actual values required.

For large sheets with very little data, you may want to experiment for speed with replacing the column scan by simply iterating over all columns, once you'd determined the last row (to limit size), like this:

columns = sheet.iter_cols(max_row=row)
column = 1
ci = 1
while True:
    try:
        cells = next(columns)
        if not all([cell.value is None for cell in cells]):
            column = ci
        ci += 1
    except StopIteration:
        break

But I would expect that the first way is fastest for most useful use cases.

If you prefer short over readable:

def find_edges2(sheet):
    def row():
        for r in range(sheet.max_row, 0, -1):
            if not all([cell.value is None for cell in sheet[r]]):
               return r

    row = row()
    if not row:
        return 0, 0

    def column():
        for c in range(sheet.max_column, 0, -1):
            if not all([cell.value is None for cell in next(sheet.iter_cols(min_col=c, max_col=c, max_row=row))]):
                return c

    return row, column()

Upvotes: 10

Related Questions