Reputation: 637
I have two time columns in my dataframe: called date1 and date2. As far as I always assumed, both are in date_time format. However, I now have to calculate the difference in days between the two and it doesn't work.
I run the following code to analyse the data:
df['month1'] = pd.DatetimeIndex(df['date1']).month
df['month2'] = pd.DatetimeIndex(df['date2']).month
print(df[["date1", "date2", "month1", "month2"]].head(10))
print(df["date1"].dtype)
print(df["date2"].dtype)
The output is:
date1 date2 month1 month2
0 2016-02-29 2017-01-01 1 1
1 2016-11-08 2017-01-01 1 1
2 2017-11-27 2009-06-01 1 6
3 2015-03-09 2014-07-01 1 7
4 2015-06-02 2014-07-01 1 7
5 2015-09-18 2017-01-01 1 1
6 2017-09-06 2017-07-01 1 7
7 2017-04-15 2009-06-01 1 6
8 2017-08-14 2014-07-01 1 7
9 2017-12-06 2014-07-01 1 7
datetime64[ns]
object
As you can see, the month for date1 is not calculated correctly! The final operation, which does not work is:
df["date_diff"] = (df["date1"]-df["date2"]).astype('timedelta64[D]')
which leads to the following error:
incompatible type [object] for a datetime/timedelta operation
I first thought it might be due to date2, so I tried:
df["date2_new"] = pd.to_datetime(df['date2'] - 315619200, unit = 's')
leading to:
unsupported operand type(s) for -: 'str' and 'int'
Anyone has an idea what I need to change?
Upvotes: 1
Views: 445
Reputation: 153460
Use .dt accessor with days
attribute:
df[['date1','date2']] = df[['date1','date2']].apply(pd.to_datetime)
df['date_diff'] = (df['date1'] - df['date2']).dt.days
Output:
date1 date2 month1 month2 date_diff
0 2016-02-29 2017-01-01 1 1 -307
1 2016-11-08 2017-01-01 1 1 -54
2 2017-11-27 2009-06-01 1 6 3101
3 2015-03-09 2014-07-01 1 7 251
4 2015-06-02 2014-07-01 1 7 336
5 2015-09-18 2017-01-01 1 1 -471
6 2017-09-06 2017-07-01 1 7 67
7 2017-04-15 2009-06-01 1 6 2875
8 2017-08-14 2014-07-01 1 7 1140
9 2017-12-06 2014-07-01 1 7 1254
Upvotes: 1