Paul
Paul

Reputation: 3087

Pandas timedelta64[ns] calculations

I have the following DataFrame:

>>> sampleDate = pd.date_range('2018-01-01', periods=10, freq='D 16H')
>>> duration = pd.TimedeltaIndex(data =['123 days 5 hours', '1 day 6 min',  '2 days', '23 hours 7 min','5 days 17 hours 3 min','18 min', '1 day 17 hours',  '22 day 2 min', '22 hours','15 min'])
>>> df = pd.DataFrame(data={'time': sampleDate, 'duration': duration})
>>> df = df.set_index('time').sort_index()
>>> df
                             duration
time                                 
2018-01-01 00:00:00 123 days 05:00:00
2018-01-02 16:00:00   1 days 00:06:00
2018-01-04 08:00:00   2 days 00:00:00
2018-01-06 00:00:00   0 days 23:07:00
2018-01-07 16:00:00   5 days 17:03:00
2018-01-09 08:00:00   0 days 00:18:00
2018-01-11 00:00:00   1 days 17:00:00
2018-01-12 16:00:00  22 days 00:02:00
2018-01-14 08:00:00   0 days 22:00:00
2018-01-16 00:00:00   0 days 00:15:00

I would like to do statistics on the different weeks or days of the year (e.g. mean duration during the first 7 days of the year). What I have tried is running

>>> df.rolling('7d').mean()

pandas.core.base.DataError: No numeric types to aggregate

What would it take to get it to pick up the right column?

Upvotes: 1

Views: 275

Answers (1)

jezrael
jezrael

Reputation: 862691

You can convert duration to native format by astype, count mean and convert back:

df['duration'] = df['duration'].astype(np.int64)
df = pd.to_timedelta(df.rolling('7d')['duration'].mean())
print (df)
time
2018-01-01 00:00:00   123 days 05:00:00
2018-01-02 16:00:00    62 days 02:33:00
2018-01-04 08:00:00    42 days 01:42:00
2018-01-06 00:00:00    31 days 19:03:15
2018-01-07 16:00:00    26 days 13:51:12
2018-01-09 08:00:00     1 days 22:30:48
2018-01-11 00:00:00     2 days 01:53:36
2018-01-12 16:00:00     6 days 01:54:00
2018-01-14 08:00:00     6 days 01:40:36
2018-01-16 00:00:00     4 days 22:19:00
Name: duration, dtype: timedelta64[ns]

Or convert timedeltas to seconds:

df['duration'] = df['duration'].dt.total_seconds()
df1 = df.rolling('7d')['duration'].mean()
print (df1)
time
2018-01-01 00:00:00    10645200.0
2018-01-02 16:00:00     5365980.0
2018-01-04 08:00:00     3634920.0
2018-01-06 00:00:00     2746995.0
2018-01-07 16:00:00     2296272.0
2018-01-09 08:00:00      167448.0
2018-01-11 00:00:00      179616.0
2018-01-12 16:00:00      525240.0
2018-01-14 08:00:00      524436.0
2018-01-16 00:00:00      425940.0
Name: duration, dtype: float64

Upvotes: 2

Related Questions