Reputation: 1133
I have a pandas dataframe with two dates in them. I want to take the difference in days between them. But the resulting difference looks like a string ex ('7 days'). Is there a way to change this to just the integer date difference?
y['datePulled'] = pd.to_datetime(y['datePulled'])
y['Dates'] = pd.to_datetime(y['Dates'])
y['Datediff'] = y['datePulled'] - y['Dates']
y['Datediff']
0 7 days
1 6 days
2 5 days
3 4 days
4 3 days
5 2 days
6 1 days
Upvotes: 1
Views: 2733
Reputation: 863291
You can use:
(y['Datediff'] / np.timedelta64(1, 'D')).astype(int)
Or:
y['Datediff'].dt.days
Sample:
import pandas as pd
import numpy as np
y = pd.DataFrame({ 'datePulled': ['2016-01-05','2016-01-04'],
'Dates': ['2016-01-01','2016-01-02']})
y['datePulled'] = pd.to_datetime(y['datePulled'])
y['Dates'] = pd.to_datetime(y['Dates'])
y['Datediff'] = y['datePulled'] - y['Dates']
print (y)
#output is float, cast to int
y['Datediff1'] = (y['Datediff'] / np.timedelta64(1, 'D')).astype(int)
y['Datediff2'] = y['Datediff'].dt.days
print (y)
Dates datePulled Datediff Datediff1 Datediff2
0 2016-01-01 2016-01-05 4 days 4 4
1 2016-01-02 2016-01-04 2 days 2 2
In larger DataFrame first method is faster:
y = pd.concat([y]*1000).reset_index(drop=True)
In [236]: %timeit (y['Datediff'] / np.timedelta64(1, 'D')).astype(int)
1000 loops, best of 3: 789 µs per loop
In [237]: %timeit y['Datediff'].dt.days
100 loops, best of 3: 15.3 ms per loop
Upvotes: 3