Luciano
Luciano

Reputation: 2418

pandas : reduction of timedelta64 using sum() results in int64?

According to the pandas 0.13.1 manual, you can reduce a numpy timedelta64 series:

http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-deltas-reductions

This seems to work fine with, for example, mean():

In[107]:
pd.Series(np.random.randint(0,100000,100).astype("timedelta64[ns]")).mean()
Out[107]:
0   00:00:00.000047
dtype: timedelta64[ns]

However, using sum(), this always results in an integer:

In [108]:
pd.Series(np.random.randint(0,100000,100).astype("timedelta64[ns]")).sum()
Out[108]:
5047226

Is this a bug, or is there e.g. overflow that is causing this? Is it safe to cast the result into timedelta64? How would I work around this?

I am using numpy 1.8.0.

Upvotes: 0

Views: 224

Answers (1)

Jeff
Jeff

Reputation: 129068

Looks like a bug, just filed this: https://github.com/pydata/pandas/issues/6462

The results are in nanoseconds; as a work-around you can do this:

In [1]: s = pd.to_timedelta(range(4),unit='d')

In [2]: s
Out[2]: 
0   0 days
1   1 days
2   2 days
3   3 days
dtype: timedelta64[ns]

In [3]: s.mean()
Out[3]: 
0   1 days, 12:00:00
dtype: timedelta64[ns]

In [4]: s.sum()
Out[4]: 518400000000000

In [8]: pd.to_timedelta([s.sum()])
Out[8]: 
0   6 days
dtype: timedelta64[ns]

Upvotes: 1

Related Questions