Reputation: 2019
I have two columns, fromdate
and todate
, in a dataframe.
import pandas as pd
data = {'todate': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
'fromdate': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
df = pd.DataFrame(data)
I add a new column, diff
, to find the difference between the two dates using
df['diff'] = df['fromdate'] - df['todate']
I get the diff
column, but it contains days
, when there's more than 24 hours.
todate fromdate diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000
How do I convert my results to only hours and minutes (i.e. days are converted to hours)?
Upvotes: 169
Views: 481035
Reputation: 8996
This was driving me bonkers as the .astype()
solution above didn't work for me. But I found another way. Haven't timed it or anything, but might work for others out there:
t1 = pd.to_datetime('1/1/2015 01:00')
t2 = pd.to_datetime('1/1/2015 03:30')
print pd.Timedelta(t2 - t1).total_seconds() / 3600.0
...if you want hours. Or:
print pd.Timedelta(t2 - t1).total_seconds() / 60.0
...if you want minutes.
UPDATE: There used to be a helpful comment here that mentioned using .total_seconds()
for time periods spanning multiple days. Since it's gone, I've updated the answer.
Upvotes: 81
Reputation: 62523
days + hours
. Minutes are not included.hh:mm
or x hours y minutes
, would require additional calculations and string formatting.timedelta
math, and is faster than using .astype('timedelta64[h]')
.
pandas v2.0.0
, .astype('timedelta64[h]')
is not allowed.timedelta
objects: See supported operations.datetime64[ns] dtype
. It is required that all relevant columns are converted using pandas.to_datetime()
.python 3.11.2
, pandas 2.0.1
, numpy 1.24.3
import pandas as pd
# test data from OP, with values already in a datetime format
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
# test dataframe; the columns must be in a datetime format; use pandas.to_datetime if needed
df = pd.DataFrame(data)
# add a timedelta column if wanted. It's added here for information only
# df['time_delta_with_sub'] = df.from_date.sub(df.to_date) # also works
df['time_delta'] = (df.from_date - df.to_date)
# create a column with timedelta as total hours, as a float type
df['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)
# create a colume with timedelta as total minutes, as a float type
df['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)
# display(df)
to_date from_date time_delta tot_hour_diff tot_mins_diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000 58.636061 3518.163667
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000 3.684528 221.071667
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000 8.714933 522.896000
.total_seconds()
was added and merged when the core developer was on vacation, and would not have been approved.
.total_xx
methods.# convert the entire timedelta to seconds
# this is the same as td / timedelta(seconds=1)
(df.from_date - df.to_date).dt.total_seconds()
[out]:
0 211089.82
1 13264.30
2 31373.76
dtype: float64
# get the number of days
(df.from_date - df.to_date).dt.days
[out]:
0 2
1 0
2 0
dtype: int64
# get the seconds for hours + minutes + seconds, but not days
# note the difference from total_seconds
(df.from_date - df.to_date).dt.seconds
[out]:
0 38289
1 13264
2 31373
dtype: int64
dateutil
maintainer:
(df.from_date - df.to_date) / pd.Timedelta(hours=1)
(df.from_date - df.to_date).dt.total_seconds() / 3600
dateutil
module provides powerful extensions to the standard datetime
module.%%timeit
testimport pandas as pd
# dataframe with 2M rows
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]}
df = pd.DataFrame(data)
df = pd.concat([df] * 1000000).reset_index(drop=True)
%timeit (df.from_date - df.to_date) / pd.Timedelta(hours=1)
[out]:
24.2 ms ± 2.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit (df.from_date - df.to_date).astype('timedelta64[h]')
[out]:
ValueError: Cannot convert from timedelta64[ns] to timedelta64[D]. Supported resolutions are 's', 'ms', 'us', 'ns'
Upvotes: 70
Reputation: 23439
By default, time difference in pandas is in nanosecond resolution, i.e. timedelta64[ns]
, so one way to convert it into seconds/minutes/hours/etc. is to divide its nanosecond representation by 10**9
to convert to seconds, by 60*10**9
for minutes etc. This method is at least 3 times faster than other methods suggested on this page.1
df['diff_in_seconds'] = df['from_date'].sub(df['to_date']).view('int64') // 10**9
df['diff_in_minutes'] = df['from_date'].sub(df['to_date']).view('int64') // (60*10**9)
df['diff_in_hours'] = df['from_date'].sub(df['to_date']).view('int64') // (3600*10**9)
PS: The above code assumes that you want the difference in whole seconds, minutes, hours etc. so it uses integer division (//
) but if you want the fractions as well, then use true division (/
) instead. That said, if you want the exact difference, then instead of fractional seconds/minutes/hours, consider converting the difference into higher resolution (milliseconds/microseconds/etc.)
1 Some benchmarks using Trenton McKinney's setup:
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')]*1000000,
'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]*1000000}
df = pd.DataFrame(data)
df['Diff'] = df['from_date'] - df['to_date']
%timeit df['Diff'].view('int64') // (3600*10**9)
# 11 ms ± 271 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['Diff'] // pd.Timedelta(hours=1)
# 36.7 ms ± 2.99 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['Diff'].astype('timedelta64[h]')
# 46.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['Diff'].dt.total_seconds() // 3600
# 169 ms ± 7.71 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 3
Reputation: 7358
Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so
import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')
to yield,
0 58
1 3
2 8
dtype: float64
Upvotes: 198