kronosapiens
kronosapiens

Reputation: 1441

Preventing pandas from coercing datetime.timedelta to numpy.timedelta64 during Series operations?

I am attempting to parse a large set of keystroke data, and have run into a datatype challenge trying to create a column for time elapsed between key press and release.

My goal is to create a column of timedeltas of the datetime.timedelta type. The parser flow is as follows:

  1. Convert the press and release times from a string to a datetime using pandas.to_datetime; this returns a Timestamp datatype.
  2. Subtract the press and release times to get the length of the keypress, as a datetime.timedelta type.
  3. (Later) Use the timedelta.total_seconds() method to get the seconds as an integer or float, for further analysis.

I'm running into a problem at step 2 -- when I subtract two individual Timestamps in the interpreter, I get a datetime.timedelta (which is what I want), when I subtract two Series of Timestamps, the resulting Series is the numpy.timedelta64 type! Does anyone know why pandas is returning this datatype for Series subtractions, while I'm getting a datetime.timedelta for individual subtractions?

Thank you so much!

I've pasted my debugging session below, where I first do the conversion manually on a single row, and then the same conversion using Series-wide operators.

Manually:

In : touchtime = task_dataframe['TouchTime'].ix[0]
In : touchtime
Out[1]: u'07:01:00.891'
In : releasetime = task_dataframe['ReleaseTime'].ix[0]
In : releasetime
Out[1]: u'07:01:00.950'
In : import pandas as pd
In : touchtime = pd.to_datetime(touchtime)
In : touchtime
Out[1]: Timestamp('2014-05-30 07:01:00.891000', tz=None)
In : releasetime = pd.to_datetime(releasetime)
In : releasetime
Out[1]: Timestamp('2014-05-30 07:01:00.950000', tz=None)
In : holdtime = releasetime - touchtime
In : holdtime
Out[1]: datetime.timedelta(0, 0, 59000)

Series-wide:

In : task_dataframe['TouchTime'] = task_dataframe['TouchTime'].map(lambda x: pd.to_datetime(x))
In : task_dataframe['ReleaseTime'] = task_dataframe['ReleaseTime'].map(lambda x: pd.to_datetime(x))
In : releasetime2 = task_dataframe['ReleaseTime'].ix[0]
In : releasetime2
Out[1]: Timestamp('2014-05-30 07:01:00.950000', tz=None) # Same output as above
In : releasetime == releasetime2
Out[1]: True # Showing equivalence
In : task_dataframe['HoldTime'] = task_dataframe['ReleaseTime'] - task_dataframe['TouchTime']
In : holdtime2 = task_dataframe['HoldTime'].ix[0]
In : holdtime2
Out[1]: numpy.timedelta64(59000000,'ns')
In : holdtime == holdtime2
Out[1]: False # Non-equivalent

Upvotes: 1

Views: 1994

Answers (1)

Jeff
Jeff

Reputation: 129018

pandas holds timedelta64[ns] internally (as a numpy array). This is a much more efficient representation (as its basically an integer) for computation.

You can convert frequency here.

What are you ultimately trying to do?

Upvotes: 1

Related Questions