Reputation: 79
I have a csv file which is a list of dates as strings e.g. 2018-05-04
I'm trying to make a new column in a dataframe that is the difference in days. I'm having a problem with different formats of date as apparently you can't subtract a pandas to_datetime from a datetime.date
This is what i have
import pandas as pd
import datetime as dt
today=dt.date.today() #<class 'datetime.date'>
df=pd.read_csv('test.csv')
df.columns=['ATH']
df['ATH']=pd.to_datetime(df['ATH']) #<class 'pandas._libs.tslibs.timestamps.Timesta...
df['diff']=df['ATH'].apply(lambda x : abs(today-x).days)
and this is what happens: TypeError: unsupported operand type(s) for -: 'datetime.date' and 'Timestamp'
I can see in the pandas to_datetime documentation that it can return "scalar: Timestamp (or datetime.datetime)" and whilst this must be what's happening, I can't figure out what I should be doing instead.
If I print df after pd.to_datetime it's still just a list of dates
I found d.dtypes and ATH is datetime64[ns]
Upvotes: 1
Views: 13833
Reputation: 4407
You are subtracting a pd.Timestamp
object from a datetime.date
instance. The latter has no notion of time. You should convert today
to a datetime.datetime
object instead:
# Initialises a datetime.datetime instance at midnight of the given date (today).
today = dt.datetime.combine(dt.date.today(), dt.datetime.min.time())
Or get a Timestamp
with Pandas:
today = pd.to_datetime('today').normalize()
Upvotes: 2