Reputation: 51
I want to create a new column DATE in the dataframe transaction from the column DAY (lists days from beginning of the study). Like, if DAY is 1, I want DATE to show 2014-1-1, or if DAY is 12 it should be 2014-1-12. This is what I am doing to convert days into date and it works:
import datetime
def serial_date_to_string(srl_no):
new_date = datetime.datetime(2014,1,1,0,0) + datetime.timedelta(srl_no - 1)
return new_date.strftime("%Y-%m-%d")
However, when I try to use this formula to add a new column, it doesn't work:
transaction['DATE'] = serial_date_to_string(transaction['DAY'])
TypeError: unsupported type for timedelta days component: Series
But the DAY column type is int64. I tried to search on forums and found that the formula could be adjusted, if I try to use this:
def serial_date_to_string(srl_no):
new_date = datetime.datetime(2014,1,1,0,0) + (srl_no - 1).map(datetime.timedelta)
return new_date.strftime("%Y-%m-%d")
It still gives AttributeError: 'Series' object has no attribute 'strftime'.
Thank you for any help!
Upvotes: 1
Views: 2104
Reputation: 863056
Use to_datetime
with parameters origin
for starting day and unit=d
for days
:
df = pd.DataFrame({'DAY':[1,12,20]})
df['date'] = pd.to_datetime(df['DAY'] - 1, origin='2014-01-01', unit='d')
print (df)
DAY date
0 1 2014-01-01
1 12 2014-01-12
2 20 2014-01-20
For same ouput add Series.dt.strftime
:
df['date'] = pd.to_datetime(df['DAY'] - 1, origin='2014-01-01', unit='d').dt.strftime("%Y-%m-%d")
print (df)
DAY date
0 1 2014-01-01
1 12 2014-01-12
2 20 2014-01-20
EDIT:
For your function is possible use Series.apply
:
transaction['DATE'] = transaction['DAY'].apply(serial_date_to_string)
Performance is similar, apply
here is fastest for 10k rows:
import datetime
def serial_date_to_string(srl_no):
new_date = datetime.datetime(2014,1,1,0,0) + datetime.timedelta(srl_no - 1)
return new_date.strftime("%Y-%m-%d")
np.random.seed(2021)
df = pd.DataFrame({'DAY': np.random.randint(10, 1000, size=10000)})
In [17]: %timeit pd.to_datetime(df['DAY'] - 1, origin='2014-01-01', unit='d').dt.strftime("%Y-%m-%d")
79.4 ms ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [18]: %timeit df['DAY'].apply(serial_date_to_string)
57.1 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [19]: %timeit df['DAY'].map(serial_date_to_string)
64.7 ms ± 5.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 1
Reputation: 294468
Use map
transaction['DATE'] = transaction['DAY'].map(serial_date_to_string)
Upvotes: 2