Reputation: 67
I have a dataframe and I want to add a column containing time difference between two another column:
df[Diff] = df['End Time'] - df['Open Time']
df[Diff]
0 0 days 01:25:40
1 0 days 00:41:57
2 0 days 00:21:47
3 0 days 16:41:57
4 0 days 04:32:00
5 0 days 03:01:57
6 0 days 01:37:56
7 0 days 01:13:57
8 0 days 01:07:56
9 0 days 02:33:59
10 29 days 18:33:53
11 0 days 03:50:56
12 0 days 01:57:56
I would like to have this column in format '1h 25m', so I have tried to calculate days in hours:
diff = df['End Time'] - df['Open Time']
hours = diff.dt.days * 24 + diff.dt.components.hours
minutes = diff.dt.components.minutes
and received these results:
0 1
1 0
2 0
3 16
4 4
5 3
6 1
7 1
8 1
9 2
10 714
11 3
12 1
dtype: int64h 0 25
1 41
2 21
3 41
4 32
5 1
6 37
7 13
8 7
9 33
10 33
11 50
12 57
Name: minutes, dtype: int64m
But I can't express these results in this format in new column:
'{}h {}m'.format(hours,minutes))
Upvotes: 2
Views: 920
Reputation: 863501
You can use total_seconds
for convert timedelta
to seconds and then count hours
, minutes
and also seconds, what is 10 times faster as using dt.components
:
s = diff.dt.total_seconds().astype(int)
hours = s // 3600
# remaining seconds
s = s - (hours * 3600)
# minutes
minutes = s // 60
# remaining seconds
seconds = s - (minutes * 60)
a = hours.astype(str) + 'h ' + minutes.astype(str)
print (a)
0 1h 25
1 0h 41
2 0h 21
3 16h 41
4 4h 32
5 3h 1
6 1h 37
7 1h 13
8 1h 7
9 2h 33
10 714h 33
11 3h 50
12 1h 57
Name: Diff, dtype: object
Zero comment solutions:
hours = diff.dt.days * 24 + diff.dt.components.hours
minutes = diff.dt.components.minutes
a = hours.astype(str) + 'h ' + minutes.astype(str)
print (a)
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 18h 33m
11 3h 50m
12 1h 57m
dtype: object
Another:
a = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
print (a)
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 714h 33m
11 3h 50m
12 1h 57m
dtype: object
Timings:
#13000 rows
df = pd.concat([df]*1000).reset_index(drop=True)
In [191]: %%timeit
...: hours = diff.dt.days * 24 + diff.dt.components.hours
...: minutes = diff.dt.components.minutes
...:
...: a = hours.astype(str) + 'h ' + minutes.astype(str)
...:
1 loop, best of 3: 483 ms per loop
In [192]: %%timeit
...: s = diff.dt.total_seconds().astype(int)
...:
...: hours = s // 3600
...: # remaining seconds
...: s = s - (hours * 3600)
...: # minutes
...: minutes = s // 60
...: # remaining seconds
...: seconds = s - (minutes * 60)
...:
...: a = hours.astype(str) + 'h ' + minutes.astype(str)
...:
10 loops, best of 3: 43.9 ms per loop
In [193]: %%timeit
...: hours = diff.dt.days * 24 + diff.dt.components.hours
...: minutes = diff.dt.components.minutes
...: s = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
...:
1 loop, best of 3: 465 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ solution
In [194]: %%timeit
...: c = diff.dt.components[['days', 'hours', 'minutes']]
...: a = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
...:
1 loop, best of 3: 208 ms per loop
Upvotes: 1
Reputation: 403050
You could extract the relevant columns, convert to str
using astype
, and just concat the cols as needed.
c = (df['End Time'] - df['Open Time'])\
.dt.components[['days', 'hours', 'minutes']]
df['diff'] = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
df['diff']
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 714h 33m
11 3h 50m
12 1h 57m
Name: diff, dtype: object
Upvotes: 2