Reputation: 1202
I have a data file which looks like the following:
Date item purchased
01-12-2018 Car
02-12-2018 Truck
03-12-2018 Car
04-12-2018 Bike
As a part of the data cleaning process, I need to check that the data is in chronological order. So I need to check that the date in a row is after the date in the previous row. If not, I need to delete that row. I am instructed to not use the pandas library.
So far, I have done the following steps so far:
#If the file name is - 'Input_file'
from openpyxl import load_workbook
from datetime import datetime
#Reading the file
wb = load_workbook(Input_file)
sheet = wb.active
#Reading the Date column in the file
Date_column = sheet['A']
#Reading each row and the date in each row to compare it with the previous row date
for x in range(len(Date_column)):
Datenow = Date_column[x].value
Datebef= Date_column[x-1].value
Check = Datenow > Datebef
print(Check)
The error is get when I try to compare the datetime objects is :
TypeError: unsupported operand type(s) for -: 'str' and 'datetime.datetime'
The issue is when I check the type of the date column, it comes out to be datetime.datetime but as soon as I try to compare both the datetime.datetime objects, it tells me that one is string and one is datetime.datetime object. The confusion is if both the values are being read from the same column, How one is coming out as string and one is datetime.datetime.
How can I make sure the values stay as datetime.datetime and i can compare them.
Thanks
Upvotes: 1
Views: 209
Reputation: 19527
You can check the data type of the cell before you compare it. Also, you will need to think carefully about comparisons and the order of the deleting rows because you cannot do this while you go. Instead you will need to create a list of rows to delete and them delete them in reverse order.
Something like this should work.
import datetime
previous_date = datetime.date(2017, 12, 31) # adjust as necessary
rows_to_delete = []
for row in ws.iter_rows(min_col=1, max_col=1, min_row=2):
cell = row[0]
if not isinstance(cell.value, datetime.date):
continue
if cell.value < previous_date:
rows_to_delete.append(cell.row)
previous_date = cell.value
for row in reversed(rows_to_delete):
ws.delete_rows(row)
Upvotes: 1