Reputation: 407
I have a data frame that I use to calculate the blockTime column that is the difference between endDate and startDate. It gives me the results like 0 days 01:45:00 but I need to have the decimal number just for hours. In this case 1.75.
My df is as follows:
import pandas as pd
data = {'endDate': ['01/10/2020 15:23', '01/10/2020 16:31', '01/10/2020 16:20', '01/10/2020 11:00'],
'startDate': ['01/10/2020 13:38', '01/10/2020 14:49', '01/10/2020 14:30','01/10/2020 14:30']
}
df = pd.DataFrame(data, columns = ['endDate','startDate'])
df['endDate'] = pd.to_datetime(df['endDate'])
df['startDate'] = pd.to_datetime(df['startDate'])
df['blockTime'] = (df['endDate'] - df['startDate'])
df = df.reindex(columns= ['startDate', 'endDate', 'blockTime'])
And the desired results would be a data frame like the one below. Note, if the minus value is produced I need to somehow highlight it as incorrect. I thought -999 might be ideal.
startDate endDate blockTime desiredResult
2020-01-10 13:38:00 2020-01-10 15:23:00 0 days 01:45:00 1.75
2020-01-10 14:49:00 2020-01-10 16:31:00 0 days 01:42:00 1.70
2020-01-10 14:30:00 2020-01-10 16:20:00 0 days 01:50:00 1.83
2020-01-10 14:30:00 2020-01-10 11:00:00 -1 days +20:30:00 -999.00
Upvotes: 1
Views: 512
Reputation: 25489
That is just the way the timedelta
object is represented when you print the dataframe. If you just want to save the number of hours as a float
instead of the entire timedelta
object, timedelta
objects have a total_seconds()
function you can use like so:
def td2hours(tdobject):
if tdobject.total_seconds() < 0:
return -999
return tdobject.total_seconds() / 3600
df['blockTime']= (df['endDate'] - df['startDate']).apply(td2hours)
Or, as Gustavo suggested in the comments, you can avoid using apply()
. This is faster when you have large datasets:
blockTime = ((df['endDate'] - df['startDate']).dt.total_seconds() / 3600).to_numpy()
blockTime[blockTime < 0] = -999
df['blockTime'] = blockTime
Output:
endDate startDate blockTime
0 2020-01-10 15:23:00 2020-01-10 13:38:00 1.750000
1 2020-01-10 16:31:00 2020-01-10 14:49:00 1.700000
2 2020-01-10 16:20:00 2020-01-10 14:30:00 1.833333
3 2020-01-10 11:00:00 2020-01-10 14:30:00 -999.000000
Upvotes: 2