Reputation: 251
I have a data frame in pandas which includes number of days since an event occurred. I want to create a new column that calculates the date of the event by subtracting the number of days from the current date. Every time I attempt to apply pd.offsets.Day
or pd.Timedelta
I get an error stating that Series are an unsupported type. This also occurs when I use apply
. When I use map
I receive a runtime error saying "maximum recursion depth exceeded while calling a Python object".
For example, assume my data frame looked like this:
index days_since_event
0 5
1 7
2 3
3 6
4 0
I want to create a new column with the date of the event, so my expected outcome (using today's date of 12/29/2015)
index days_since_event event_date
0 5 2015-12-24
1 7 2015-12-22
2 3 2015-12-26
3 6 2015-12-23
4 0 2015-12-29
I have attempted multiple ways to do this, but have received errors for each.
One method I tried was:
now = pd.datetime.date(pd.datetime.now())
df['event_date'] = now - df.days_since_event.apply(pd.offsets.Day)
With this I received an error saying that Series are an unsupported type.
I tried the above with .map
instead of .apply
, and received the error that "maximum recursion depth exceeded while calling a Python object".
I also attempted to convert the days into timedelta, such as:
df.days_since_event = (dt.timedelta(days = df.days_since_event)).apply
This also received an error referencing the series being an unsupported type.
Upvotes: 24
Views: 29194
Reputation: 456
Just to follow up with joris' response, you can convert an int or a float into whatever time unit you want with pd.to_timedelta(x, unit='')
, changing only the entry for unit=
:
# Years, Months, Days:
pd.to_timedelta(3.5, unit='Y') # returns '1095 days 17:27:36'
pd.to_timedelta(3.5, unit='M') # returns '91 days 07:27:18'
pd.to_timedelta(3.5, unit='D') # returns '3 days 12:00:00'
# Hours, Minutes, Seconds:
pd.to_timedelta(3.5, unit='h') # returns '0 days 03:30:00'
pd.to_timedelta(3.5, unit='m') # returns '0 days 00:03:30'
pd.to_timedelta(3.5, unit='s') # returns '0 days 00:00:03.50'
Note that mathematical operations are legal once correctly formatted:
pd.to_timedelta(3.5, unit='h') - pd.to_timedelta(3.25, unit='h') # returns '0 days 00:15:00'
Upvotes: 4
Reputation: 139152
First, to convert the column with integers to a timedelta, you can use to_timedelta
:
In [60]: pd.to_timedelta(df['days_since_event'], unit='D')
Out[60]:
0 5 days
1 7 days
2 3 days
3 6 days
4 0 days
Name: days_since_event, dtype: timedelta64[ns]
Then you can create a new column with the current date and substract those timedelta's:
In [62]: df['event_date'] = pd.Timestamp('2015-12-29')
In [63]: df['event_date'] = df['event_date'] - pd.to_timedelta(df['days_since_event'], unit='D')
In [64]: df['event_date']
Out[64]:
0 2015-12-24
1 2015-12-22
2 2015-12-26
3 2015-12-23
4 2015-12-29
dtype: datetime64[ns]
Upvotes: 27