Abhishek Shankhadhar
Abhishek Shankhadhar

Reputation: 31

Difference between two dates in Pandas DataFrame

I have many columns in a data frame and I have to find the difference of time in two column named as in_time and out_time and put it in the new column in the same data frame.

The format of time is like this 2015-09-25T01:45:34.372Z.

I am using Pandas DataFrame.

I want to do like this:

df.days = df.out_time - df.in_time


I have many columns and I have to increase 1 more column in it named days and put the differences there.

Upvotes: 3

Views: 62501

Answers (3)

Adrien Riaux
Adrien Riaux

Reputation: 533

First of all, you need to convert in_time and out_time columns to datetime type.

for col in ('in_time', 'out_time') : # Looping a tuple is faster than a list
    df[col] = pd.to_datetime(df[col])

You can check the type using dtypes:

df['in_time'].dtypes

Should give: datetime64[ns, UTC]

Now you can substract them and get the difference time using dt.days or from numpy using np.timedelta64.

Example:

import numpy as np

df['days'] = (df['out_time'] - df['in_time']).dt.days 
# Or 
df['days'] = (df['out_time'] - df['in_time']) / np.timedelta64(1, 'D')

Upvotes: 1

danielhadar
danielhadar

Reputation: 2161

Well, it all kinda depends on the time format you use. I'd recommend using datetime.

If in_time and out_time are currently strings, convert them with datetime.strptime():

from datetime import datetime

f = lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ')
df.in_time = df.in_time.apply(f)
df.out_time = df.out_time.apply(f)

and then you can simply subtract them, and assign the result to a new column named 'days':

df['days'] = df.out_time - df.in_time

Example: (3 seconds and 1 day differences)

In[5]: df = pd.DataFrame({'in_time':['2015-09-25T01:45:34.372Z','2015-09-25T01:45:34.372Z'],
                          'out_time':['2015-09-25T01:45:37.372Z','2015-09-26T01:45:34.372Z']})
In[6]: df
Out[6]: 
                    in_time                  out_time
0  2015-09-25T01:45:34.372Z  2015-09-25T01:45:37.372Z
1  2015-09-25T01:45:34.372Z  2015-09-26T01:45:34.372Z

In[7]: type(df.loc[0,'in_time'])
Out[7]: str

In[8]: df.in_time = df.in_time.apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ'))   
In[9]: df.out_time = df.out_time.apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ'))

In[10]: df    # notice that it looks exactly the same, but the type is different
Out[10]: 
                  in_time                  out_time
0 2015-09-25 01:45:34.372  2015-09-25T01:45:37.372Z
1 2015-09-25 01:45:34.372  2015-09-26T01:45:34.372Z

In[11]: type(df.loc[0,'in_time'])
Out[11]: pandas.tslib.Timestamp

And the creation of the new column:

In[12]: df['days'] = df.out_time - df.in_time
In[13]: df
Out[13]: 
                  in_time                out_time            days
0 2015-09-25 01:45:34.372 2015-09-25 01:45:37.372 0 days 00:00:03
1 2015-09-25 01:45:34.372 2015-09-26 01:45:34.372 1 days 00:00:00

Now you can play with the output format. For example, the portion of seconds difference:

In[14]: df.days = df.days.apply(lambda x: x.total_seconds()/60)
In[15]: df
Out[15]: 
                  in_time                out_time     days
0 2015-09-25 01:45:34.372 2015-09-25 01:45:37.372     0.05
1 2015-09-25 01:45:34.372 2015-09-26 01:45:34.372  1440.00

Note: Regarding the in_time and out_time format, notice that I made some assumptions (for example, that you're using a 24H clock (thus using %H and not %I)). To play with the format have a look at: strptime() documentation.

Note2: It would obviously be better if you can design your program to use datetime from the beginning (instead of using strings and converting them).

Upvotes: 4

EdChum
EdChum

Reputation: 394459

You need to convert the strings to datetime dtype, you can then subtract whatever arbitrary date you want and on the resulting series call dt.days:

In [15]:
df = pd.DataFrame({'date':['2015-09-25T01:45:34.372Z']})
df

Out[15]:
                       date
0  2015-09-25T01:45:34.372Z

In [19]:
df['date'] = pd.to_datetime(df['date'])
df['day'] = (df['date'] - dt.datetime.now()).dt.days
df

Out[19]:
                     date  day
0 2015-09-25 01:45:34.372 -252

Upvotes: 9

Related Questions