Kierk
Kierk

Reputation: 498

openpyxl delete rows based on condition

I am trying to iterate through the rows in a spreadsheet and delete rows where column 7 is equal to 'Prospect Enterprise'.

I can't quite figure out the logic. Appreciate someone helping me figure it out.

I have tried this:

from openpyxl import load_workbook
wb1 = load_workbook('C:\\transfer\\AllSitesOpen.xlsx')
ws1 = wb1.active
for row in ws1.iter_rows():
    if row[6].value == 'Prospect Enterprise':
        #print(row)
        ws1.delete_rows(row)
wb1.save('C:\\transfer\\AllSitesOpen_ProspectDeleted.xlsx')

and get:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-27-50c694086280> in <module>
      5     if row[6].value == 'Prospect Enterprise':
      6         #print(row)
----> 7         ws1.delete_rows(row)
      8 wb1.save('C:\\transfer\\AllSitesOpen_ProspectDeleted.xlsx')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in delete_rows(self, idx, amount)
    706         """
    707 
--> 708         remainder = _gutter(idx, amount, self.max_row)
    709 
    710         self._move_cells(min_row=idx+amount, offset=-amount, row_or_col="row")

~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in _gutter(idx, offset, max_val)
    877     range(cells_to_delete) > range(cell_to_be_moved)
    878     """
--> 879     gutter = range(max(max_val+1-offset, idx), min(idx+offset, max_val)+1)
    880     return gutter

TypeError: '>' not supported between instances of 'tuple' and 'int'

But the print rows does print out the row I want to delete:

from openpyxl import load_workbook
wb1 = load_workbook('C:\\transfer\\AllSitesOpen.xlsx')
ws1 = wb1.active
for row in ws1.iter_rows():
    if row[6].value == 'Prospect Enterprise':
        print(row)
        #ws1.delete_rows(row)
#wb1.save('C:\\transfer\\AllSitesOpen_ProspectDeleted.xlsx')
(<Cell 'report1571089659791'.A5>, <Cell 'report1571089659791'.B5>, <Cell 'report1571089659791'.C5>, <Cell 'report1571089659791'.D5>, <Cell 'report1571089659791'.E5>, <Cell 'report1571089659791'.F5>, <Cell 'report1571089659791'.G5>, <Cell 'report1571089659791'.H5>, <Cell 'report1571089659791'.I5>, <Cell 'report1571089659791'.J5>, <Cell 'report1571089659791'.K5>, <Cell 'report1571089659791'.L5>, <Cell 'report1571089659791'.M5>)
(<Cell 'report1571089659791'.A8>, <Cell 'report1571089659791'.B8>, <Cell 'report1571089659791'.C8>, <Cell 'report1571089659791'.D8>, <Cell 'report1571089659791'.E8>, <Cell 'report1571089659791'.F8>, <Cell 'report1571089659791'.G8>, <Cell 'report1571089659791'.H8>, <Cell 'report1571089659791'.I8>, <Cell 'report1571089659791'.J8>, <Cell 'report1571089659791'.K8>, <Cell 'report1571089659791'.L8>, <Cell 'report1571089659791'.M8>)
(<Cell 'report1571089659791'.A18>, <Cell 'report1571089659791'.B18>, <Cell 'report1571089659791'.C18>, <Cell 'report1571089659791'.D18>, <Cell 'report1571089659791'.E18>, <Cell 'report1571089659791'.F18>, <Cell 'report1571089659791'.G18>, <Cell 'report1571089659791'.H18>,

And so on

I have also tried:

from openpyxl import load_workbook
wb1 = load_workbook('C:\\transfer\\AllSitesOpen.xlsx')
ws1 = wb1.active
for row in ws1.iter_rows(min_col=7, max_col=7, min_row=2, max_row = None):
    for cell in row:
        if cell.value == 'Prospect Enterprise':
            ws1.delete_rows(row)
            #print(row)
wb1.save('C:\\transfer\\AllSitesOpen_ProspectDeleted.xlsx')

but get this exception:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-57e660d2328f> in <module>
      6     for cell in row:
      7         if cell.value == 'Prospect Enterprise':
----> 8             ws1.delete_rows(row)
      9             #print(row)
     10 wb1.save('C:\\transfer\\AllSitesOpen_ProspectDeleted.xlsx')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in delete_rows(self, idx, amount)
    706         """
    707 
--> 708         remainder = _gutter(idx, amount, self.max_row)
    709 
    710         self._move_cells(min_row=idx+amount, offset=-amount, row_or_col="row")

~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in _gutter(idx, offset, max_val)
    877     range(cells_to_delete) > range(cell_to_be_moved)
    878     """
--> 879     gutter = range(max(max_val+1-offset, idx), min(idx+offset, max_val)+1)
    880     return gutter

TypeError: '>' not supported between instances of 'tuple' and 'int'

Upvotes: 0

Views: 2230

Answers (1)

mgrollins
mgrollins

Reputation: 651

According to this answer https://stackoverflow.com/a/49370237/3911355 and the documentation it looks like you want to pass the index (idx) of the row to delete, not the row itself. So, you might want to have an enum on your row loop, or have some other way to track which row number you want to delete.

It might be better to construct a list of the row numbers that meet your deletion criteria, and delete them all at once after going through the worksheet once.

Upvotes: 1

Related Questions