Kartik
Kartik

Reputation: 8703

Why isn't there a proper datetime.time type in Pandas or numpy?

I have a dataset with people's departure time for work and the time they take to get where they work. Since people generally go to work every weekday, there obviously is no need for a date associated with the data. I leave for work at 8 AM every working day, and return at 5 PM every working day.

Similarly for schools, offices, etc. There are a number of places where date does not matter as much as time. There is also the converse, where time does not matter as much as date. Back to my problem.

My time is coded as an epoch, and converting to datetime is pretty easy:

In [1]: df['time'] = pd.to_datetime(df['time'], unit='m')
        df['time'].head(3)
Out[1]: 0    1970-01-01 06:15:00
        1    1970-01-01 06:17:00
        2    1970-01-01 08:10:00
        Name: time, dtype: datetime64[ns]

But there is the pesky 1970-01-01 in there. I want to get rid of it:

In [2]: df['time'].dt.time.head(3)
Out[2]: 0    06:15:00
        1    06:17:00
        2    08:10:00
        Name: time, dtype: object

Now it is converted into object, which is even peskier than having 1970-01-01, because I cannot do things like:

In [3]: df['time'].dt.time + pd.to_timedelta(df['travel'], unit='m')
Out[3]: ---------------------------------------------------------------------
        TypeError                           Traceback (most recent call last)
        < whole bunch of tracebacks. I know what's going on here >
        TypeError: ufunc subtract cannot use operands with types dtype('O') and dtype('<m8[ns]')

Then there is this numpy page, with tons of examples, but every single one of them has a date component; none have only the time component. For example, I quote:

>>> np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64[D]')

The story repeats in this Pandas page. There are numerous examples with only date component, but not a single example with only time component.

Why the lack of love to storing pure time in a manipulatable format? Do I have to resort to converting all of my data into Python's native datetime.time type (which will kill me because I have billions of rows to process)? What I am looking for is a way to store only the time component in a manipulatable format. An answer which sheds light in that direction will be accepted.

Upvotes: 2

Views: 2745

Answers (1)

Kartik
Kartik

Reputation: 8703

Since, @unutbu has not posted an answer to this question, but just commented on it, I shall post what worked, and accept it as answer. If later @unutbu does post an answer, I shall accept that.

Basically, as I mention in the question, date component of datetime does not matter to me for this task. Therefore, the simplest solution is to do the arithmetic first, then get just time:

(df['time'] + pd.to_timedelta(df['travel'], unit='m')).dt.time

Upvotes: 2

Related Questions