Reputation: 3095
I have a pandas dataframe with a datetimeindex. I would like to create a column that contains the elapsed time. I'm calculating it like this:
startTime = df.index[0]
elapsed = df.index - startTime
Result:
TypeError Traceback (most recent call last)
<ipython-input-56-279fd541b1e2> in <module>()
----> 1 df.index - startTime
C:\Python27\lib\site-packages\pandas\tseries\index.pyc in __sub__(self, other)
612 return self.shift(-other)
613 else: # pragma: no cover
--> 614 raise TypeError(other)
615
616 def _add_delta(self, delta):
TypeError: 2014-07-14 14:47:57
The weird thing is that for example:
df.index[1] - startTime
returns:
datetime.timedelta(0, 1)
I thought that maybe the fact that it's a datetimeindex and not a plain series that caused the problem. However when I first create a new series with df.index as the data argument and then attempt the subtraction, I get a whole load of warnings saying that I'm implicitly casting two incompatible types and that it will not work in the future:
timeStamps =pd.Series(data=df.index)
elapsed = timeStamps - timeStamps[0]
returns
C:\Python27\lib\site-packages\pandas\core\format.py:1851: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
elif format_short and x == 0:
Although I do get a correct series of TimeDelta's with the latter method, I don't like to rely on deprecated code. Is there a 'proper' way to calculate elapsed times?
Here is a piece of the csv-file that I get the data from:
Timestamp Bubbler_Temperature_Setpoint
14-7-2014 14:47:57 13.000000
14-7-2014 14:47:58 13.000000
14-7-2014 14:47:59 13.000000
14-7-2014 14:48:00 13.000000
14-7-2014 14:48:01 13.000000
14-7-2014 14:48:02 13.000000
14-7-2014 14:48:03 13.000000
14-7-2014 14:48:04 13.000000
14-7-2014 14:48:05 13.000000
I read it into a dataframe with the 'read_csv' function:
df = pd.read_csv('test.csv',sep='\t',parse_dates='Timestamp',index_col='Timestamp')
I'm using pandas version 0.13.1
Upvotes: 4
Views: 7980
Reputation: 1329
I just changed
elapsed = df.index - startTime
to
df['elapsed'] = df.index - startTime
to get the time change column. Isn't that all you need?
Upvotes: 1
Reputation: 129078
You are de-factor doing this:
In [30]: ts = Series(13,date_range('20140714 14:47:57',periods=10,freq='s'))
In [31]: ts
Out[31]:
2014-07-14 14:47:57 13
2014-07-14 14:47:58 13
2014-07-14 14:47:59 13
2014-07-14 14:48:00 13
2014-07-14 14:48:01 13
2014-07-14 14:48:02 13
2014-07-14 14:48:03 13
2014-07-14 14:48:04 13
2014-07-14 14:48:05 13
2014-07-14 14:48:06 13
Freq: S, dtype: int64
# iirc this is available in 0.13.1 (if not, use ``Series(ts.index)``
In [32]: x = ts.index.to_series()
In [33]: x-x.iloc[0]
Out[33]:
2014-07-14 14:47:57 00:00:00
2014-07-14 14:47:58 00:00:01
2014-07-14 14:47:59 00:00:02
2014-07-14 14:48:00 00:00:03
2014-07-14 14:48:01 00:00:04
2014-07-14 14:48:02 00:00:05
2014-07-14 14:48:03 00:00:06
2014-07-14 14:48:04 00:00:07
2014-07-14 14:48:05 00:00:08
2014-07-14 14:48:06 00:00:09
Freq: S, dtype: timedelta64[ns]
doing df.index-df.index[0]
in your example is NOT a timedelta operation, but a SET operation. See here
Upvotes: 1