Reputation: 8980
I need to add a column to a dataframe, so that row 0 is 15-Feb-2019. row 1 is 16th, etc. I have tried using the index:
import numpy as np
import pandas as pd
df=pd.DataFrame()
df['a']=np.arange(10,20)
df['date from index']=df.apply( lambda x: pd.to_datetime('15-2-2019') + pd.DateOffset(days=x.index), axis=1 )
but I get:
TypeError: ('must be str, not int', 'occurred at index 0')
which I admit I do not understand. I tried creating an explicit column to use instead of the index:
df=pd.DataFrame()
df['a']=np.arange(10,20)
df['counter']=np.arange(0,df.shape[0])
df['date from counter']=df.apply( lambda x: pd.to_datetime('15-2-2019') + pd.DateOffset(days=x['counter']), axis=1 )
but this gives me:
TypeError: ('unsupported type for timedelta days component: numpy.int32', 'occurred at index 0')
What am I doing wrong?
Upvotes: 2
Views: 719
Reputation: 403278
You can vectorise this with pd.to_timedelta
:
# pd.to_timedelta(df.index, unit='d') + pd.to_datetime('15-2-2019') # whichever
pd.to_timedelta(df.a, unit='d') + pd.to_datetime('15-2-2019')
0 2019-02-25
1 2019-02-26
2 2019-02-27
3 2019-02-28
4 2019-03-01
5 2019-03-02
6 2019-03-03
7 2019-03-04
8 2019-03-05
9 2019-03-06
Name: a, dtype: datetime64[ns]
df['date_from_counter'] = (
pd.to_timedelta(df.a, unit='d') + pd.to_datetime('15-2-2019'))
df
a counter date_from_counter
0 10 0 2019-02-25
1 11 1 2019-02-26
2 12 2 2019-02-27
3 13 3 2019-02-28
4 14 4 2019-03-01
5 15 5 2019-03-02
6 16 6 2019-03-03
7 17 7 2019-03-04
8 18 8 2019-03-05
9 19 9 2019-03-06
As expected, you can call pd.to_timedelta
on whatever column of integers with the right unit, and then use the resultant Timedelta
column for date time arithmetic.
For your code to work, it seems like you needed to pass int
, not np.int
(not sure why). This works.
dt = pd.to_datetime('15-2-2019')
df['date from counter'] = df.apply(
lambda x: dt + pd.DateOffset(days=x['counter'].item()), axis=1)
df
a counter date from counter
0 10 0 2019-02-15
1 11 1 2019-02-16
2 12 2 2019-02-17
3 13 3 2019-02-18
4 14 4 2019-02-19
5 15 5 2019-02-20
6 16 6 2019-02-21
7 17 7 2019-02-22
8 18 8 2019-02-23
9 19 9 2019-02-24
Upvotes: 2
Reputation: 863791
Use to_timedelta
for convert values to day timedeltas or use
parameter origin
with specify start day with parameter unit
in to_datetime
:
df['date from index']= pd.to_datetime('15-2-2019') + pd.to_timedelta(df.index, 'd')
df['date from counter']= pd.to_datetime('15-2-2019') + pd.to_timedelta(df['counter'], 'd')
df['date from index1']= pd.to_datetime(df.index, origin='15-02-2019', unit='d')
df['date from counter1']= pd.to_datetime(df['counter'], origin='15-02-2019', unit='d')
print(df.head())
a counter date from index date from counter date from index1 \
0 10 0 2019-02-15 2019-02-15 2019-02-15
1 11 1 2019-02-16 2019-02-16 2019-02-16
2 12 2 2019-02-17 2019-02-17 2019-02-17
3 13 3 2019-02-18 2019-02-18 2019-02-18
4 14 4 2019-02-19 2019-02-19 2019-02-19
date from counter1
0 2019-02-15
1 2019-02-16
2 2019-02-17
3 2019-02-18
4 2019-02-19
Upvotes: 3