theQman
theQman

Reputation: 1780

How to extract hours from DataFrame/Series column of timedelta objects?

My series s looks something that looks like:

0   0 days 09:14:29.142000
1   0 days 00:01:08.060000
2   1 days 00:08:40.192000
3   0 days 17:52:18.782000
4   0 days 01:56:44.696000
dtype: timedelta64[ns]

I'm having trouble understanding how to pull out the hours (rounded to the nearest hour)

Edit:

I realize I can do something like s[0].hours, which gives me 9L. So I can do s[0].hours + 24*s[0].days and then round accordingly using the minutes.

How I can do this on the entire series at once?

Upvotes: 0

Views: 4378

Answers (2)

Jeff
Jeff

Reputation: 128928

This is right out of the docs here. And this is vectorized.

In [16]: s
Out[16]: 
0   0 days 09:14:29.142000
1   0 days 00:01:08.060000
2   1 days 00:08:40.192000
3   0 days 17:52:18.782000
4   0 days 01:56:44.696000
Name: 0, dtype: timedelta64[ns]

In [17]: s.dt.components      
Out[17]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     0      9       14       29           142             0            0
1     0      0        1        8            60             0            0
2     1      0        8       40           192             0            0
3     0     17       52       18           782             0            0
4     0      1       56       44           696             0            0

In [18]: s.dt.components.hours
Out[18]: 
0     9
1     0
2     0
3    17
4     1
Name: hours, dtype: int64

Here's another way to approach this if you don't need the actual hours attribute, but the Timedelta in terms of another unit (this is called frequency conversion)

In [31]: s/pd.Timedelta('1h')
Out[31]: 
0     9.241428
1     0.018906
2    24.144498
3    17.871884
4     1.945749
dtype: float64

In [32]: np.ceil(s/pd.Timedelta('1h'))
Out[32]: 
0    10
1     1
2    25
3    18
4     2
dtype: float64

Upvotes: 3

Jonathan Eunice
Jonathan Eunice

Reputation: 22443

Let's assume your time delta column there is called "Delta". Then you can do it this way:

df['rh'] = df.Delta.apply(lambda x: round(pd.Timedelta(x).total_seconds() \
                          % 86400.0 / 3600.0) )

Each time delta is really a numpy.timedelta64 under the covers. It helps to cast it to a pandas Timedelta which has more convenient methods. Here I just ask for the number of total seconds, lop off any multiples of 86400 (i.e. numbers that indicate full days), and divide by 3600 (number of seconds in an hour). That gives you a floating point number of hours, which you then round.

dataframe after update

I assumed, btw, that you wanted just the hour, minutes, seconds, and partial seconds components considered in the rounded hours, but not the full days. If you want all the hours, including the days, just omit the modulo operation that lops off days:

df['rh2'] = df.Delta.apply(lambda x: round(pd.Timedelta(x).total_seconds() \
                           / 3600.0) )

Then you get:

alternate update

It's also possible to do these calculations directly in numpy terms:

df['rh'] = df.Delta.apply(lambda x: round(x / np.timedelta64(1, 'h')) % 24 )
df['rh2'] = df.Delta.apply(lambda x: round(x / np.timedelta64(1, 'h')) )

Where np.timedelta64(1, 'h') provides the number of nanoseconds in 1 hour, and the optional % 24 lops off whole day components (if desired).

Upvotes: 0

Related Questions