ben890
ben890

Reputation: 1133

Change pd datetime object to integer

I have a pandas dataframe with two dates in them. I want to take the difference in days between them. But the resulting difference looks like a string ex ('7 days'). Is there a way to change this to just the integer date difference?

y['datePulled'] = pd.to_datetime(y['datePulled'])
y['Dates'] = pd.to_datetime(y['Dates'])
y['Datediff'] = y['datePulled'] - y['Dates']
y['Datediff']
0    7 days
1    6 days
2    5 days
3    4 days
4    3 days
5    2 days
6    1 days

Upvotes: 1

Views: 2733

Answers (1)

jezrael
jezrael

Reputation: 863291

You can use:

(y['Datediff'] / np.timedelta64(1, 'D')).astype(int)

Or:

y['Datediff'].dt.days

Sample:

import pandas as pd
import numpy as np

y = pd.DataFrame({ 'datePulled': ['2016-01-05','2016-01-04'], 
                    'Dates': ['2016-01-01','2016-01-02']})

y['datePulled'] = pd.to_datetime(y['datePulled'])
y['Dates'] = pd.to_datetime(y['Dates'])
y['Datediff'] = y['datePulled'] - y['Dates']
print (y)

#output is float, cast to int
y['Datediff1'] = (y['Datediff'] / np.timedelta64(1, 'D')).astype(int)

y['Datediff2'] = y['Datediff'].dt.days
print (y)
       Dates datePulled  Datediff  Datediff1  Datediff2
0 2016-01-01 2016-01-05    4 days          4          4
1 2016-01-02 2016-01-04    2 days          2          2

In larger DataFrame first method is faster:

y = pd.concat([y]*1000).reset_index(drop=True)

In [236]: %timeit (y['Datediff'] / np.timedelta64(1, 'D')).astype(int)
1000 loops, best of 3: 789 µs per loop

In [237]: %timeit y['Datediff'].dt.days
100 loops, best of 3: 15.3 ms per loop

Upvotes: 3

Related Questions