zsh_18
zsh_18

Reputation: 1202

Comparing the datetime objects - Python

I have a data file which looks like the following:

Date          item purchased
01-12-2018      Car
02-12-2018      Truck
03-12-2018      Car
04-12-2018      Bike

As a part of the data cleaning process, I need to check that the data is in chronological order. So I need to check that the date in a row is after the date in the previous row. If not, I need to delete that row. I am instructed to not use the pandas library.

So far, I have done the following steps so far:



#If the file name is - 'Input_file'

from openpyxl import load_workbook
from datetime import datetime

#Reading the file 
wb = load_workbook(Input_file)
sheet = wb.active

#Reading the Date column in the file
Date_column = sheet['A']


#Reading each row and the date in each row to compare it with the previous row date
for x in range(len(Date_column)):
    Datenow = Date_column[x].value

    Datebef= Date_column[x-1].value
    
    Check = Datenow > Datebef

    print(Check)

The error is get when I try to compare the datetime objects is :


TypeError: unsupported operand type(s) for -: 'str' and 'datetime.datetime'

The issue is when I check the type of the date column, it comes out to be datetime.datetime but as soon as I try to compare both the datetime.datetime objects, it tells me that one is string and one is datetime.datetime object. The confusion is if both the values are being read from the same column, How one is coming out as string and one is datetime.datetime.

How can I make sure the values stay as datetime.datetime and i can compare them.

Thanks

Upvotes: 1

Views: 209

Answers (1)

Charlie Clark
Charlie Clark

Reputation: 19527

You can check the data type of the cell before you compare it. Also, you will need to think carefully about comparisons and the order of the deleting rows because you cannot do this while you go. Instead you will need to create a list of rows to delete and them delete them in reverse order.

Something like this should work.

import datetime

previous_date = datetime.date(2017, 12, 31) # adjust as necessary
rows_to_delete = []

for row in ws.iter_rows(min_col=1, max_col=1, min_row=2):
    cell = row[0]
    if not isinstance(cell.value, datetime.date):
        continue
    if cell.value < previous_date:
        rows_to_delete.append(cell.row)
    previous_date = cell.value

for row in reversed(rows_to_delete):
   ws.delete_rows(row)

Upvotes: 1

Related Questions