Reputation: 4008
I want to delete the rows from an excel file, knowing the values. I use openpyxl:
key_values_list
is list with numbers (all are present in the excel file, on column)
wb = load_workbook(src)
sheet = wb['Sheet 1']
for i in range(2, sheet.max_row + 1):
if sheet.cell(row=i, column=1).value in key_values_list:
sheet.delete_rows(i, 1)
wb.save(src)
The code above doesn't delete all corresponding rows
Upvotes: 1
Views: 19003
Reputation: 61
Another approach would be to reverse the loop. Since the lines are deleted which mixes up your indexing it makes sense to reverse the order:
rows = list(sheet.iter_rows(min_row=1, max_row=sheet.max_row))
for row in reversed(rows):
if row[0].row == 1:
break
if row[5].value != filterBy:
sheet.delete_rows(row[0].row, 1)
Upvotes: 1
Reputation: 13888
You'll always find a problem with deleting direct elements within a for loop. Consider this code for a sheet with 12 rows and its respective row value:
for i in range(1, sh.max_row + 1):
print(sh.cell(row=i, column=1).value)
# 1 .. 12
Now see what happens when you start deleting stuff:
for i in range(1, sh.max_row + 1):
if sh.cell(row=i, column=1).value in [5,6,7]:
sh.delete_rows(i, 1)
print(f'i = {i}\tcell value (i, 1) is {sh.cell(row=i, column=1).value}')
# i = 1 cell value (i, 1) is 1
# i = 2 cell value (i, 1) is 2
# i = 3 cell value (i, 1) is 3
# i = 4 cell value (i, 1) is 4
# i = 5 cell value (i, 1) is 5
# i = 6 cell value (i, 1) is 7
# i = 7 cell value (i, 1) is 9
# i = 8 cell value (i, 1) is 10
# i = 9 cell value (i, 1) is 11
# i = 10 cell value (i, 1) is 12
# i = 11 cell value (i, 1) is None
# i = 12 cell value (i, 1) is None
You can see that during the i in [5, 6, 7]
, the shifting of the rows started at row 6 because row 5 was already deleted, making the original row 6 the new row 5, original row 7 new row 6... etc. So in the next iteration of i = 6
, the cell is actually referencing the value of row 7 from the original data. You effectively skipped iterating through row 6.
The easiest answer is use a while
loop, not for
:
i = 1
while i <= sh.max_row:
print(f'i = {i}\tcell value (i, 1) is {sh.cell(row=i, column=1).value}')
if sh.cell(row=i, column=1).value in [5,6,7]:
sh.delete_rows(i, 1)
# Note the absence of incremental. Because we deleted a row, we want to stay on the same row because new data will show in the next iteration.
else:
i += 1
# Because the check failed, we can safely increment to the next row.
# i = 1 cell value (i, 1) is 1
# i = 2 cell value (i, 1) is 2
# i = 3 cell value (i, 1) is 3
# i = 4 cell value (i, 1) is 4
# i = 5 cell value (i, 1) is 5 # deleted
# i = 5 cell value (i, 1) is 6 # deleted
# i = 5 cell value (i, 1) is 7 # deleted
# i = 5 cell value (i, 1) is 8
# i = 6 cell value (i, 1) is 9
# i = 7 cell value (i, 1) is 10
# i = 8 cell value (i, 1) is 11
# i = 9 cell value (i, 1) is 12
# verify the data has been deleted
for i in range(1, sh.max_row +1):
print(sh.cell(row=i, column=1).value)
# 1
# 2
# 3
# 4
# 8
# 9
# 10
# 11
# 12
You can now see that while i
doesn't reach 12, each row is iterated through because i=5
has been handled thrice.
If for whatever reason you must use a for
loop to iterate, you might want to consider some alternative methods such as iterating through a copy or doing it backward
Upvotes: 12