Pythonista anonymous
Pythonista anonymous

Reputation: 8980

Python pandas: how to create a column which is a fixed date + the # days in another column

I need to add a column to a dataframe, so that row 0 is 15-Feb-2019. row 1 is 16th, etc. I have tried using the index:

import numpy as np
import pandas as pd
df=pd.DataFrame()
df['a']=np.arange(10,20)
df['date from index']=df.apply( lambda x: pd.to_datetime('15-2-2019') + pd.DateOffset(days=x.index), axis=1 )

but I get:

TypeError: ('must be str, not int', 'occurred at index 0')

which I admit I do not understand. I tried creating an explicit column to use instead of the index:

df=pd.DataFrame()
df['a']=np.arange(10,20)
df['counter']=np.arange(0,df.shape[0])
df['date from counter']=df.apply( lambda x: pd.to_datetime('15-2-2019') + pd.DateOffset(days=x['counter']), axis=1 )

but this gives me:

TypeError: ('unsupported type for timedelta days component: numpy.int32', 'occurred at index 0')

What am I doing wrong?

Upvotes: 2

Views: 719

Answers (2)

cs95
cs95

Reputation: 403278

You can vectorise this with pd.to_timedelta:

# pd.to_timedelta(df.index, unit='d') + pd.to_datetime('15-2-2019') # whichever
pd.to_timedelta(df.a, unit='d') + pd.to_datetime('15-2-2019')

0   2019-02-25
1   2019-02-26
2   2019-02-27
3   2019-02-28
4   2019-03-01
5   2019-03-02
6   2019-03-03
7   2019-03-04
8   2019-03-05
9   2019-03-06
Name: a, dtype: datetime64[ns]

df['date_from_counter'] = (
    pd.to_timedelta(df.a, unit='d') + pd.to_datetime('15-2-2019'))
df

    a  counter date_from_counter
0  10        0        2019-02-25
1  11        1        2019-02-26
2  12        2        2019-02-27
3  13        3        2019-02-28
4  14        4        2019-03-01
5  15        5        2019-03-02
6  16        6        2019-03-03
7  17        7        2019-03-04
8  18        8        2019-03-05
9  19        9        2019-03-06

As expected, you can call pd.to_timedelta on whatever column of integers with the right unit, and then use the resultant Timedelta column for date time arithmetic.


For your code to work, it seems like you needed to pass int, not np.int (not sure why). This works.

dt = pd.to_datetime('15-2-2019')
df['date from counter'] = df.apply(
    lambda x: dt + pd.DateOffset(days=x['counter'].item()), axis=1)
df

    a  counter date from counter
0  10        0        2019-02-15
1  11        1        2019-02-16
2  12        2        2019-02-17
3  13        3        2019-02-18
4  14        4        2019-02-19
5  15        5        2019-02-20
6  16        6        2019-02-21
7  17        7        2019-02-22
8  18        8        2019-02-23
9  19        9        2019-02-24

Upvotes: 2

jezrael
jezrael

Reputation: 863791

Use to_timedelta for convert values to day timedeltas or use parameter origin with specify start day with parameter unit in to_datetime:

df['date from index']= pd.to_datetime('15-2-2019') + pd.to_timedelta(df.index, 'd')
df['date from counter']= pd.to_datetime('15-2-2019') + pd.to_timedelta(df['counter'], 'd')

df['date from index1']= pd.to_datetime(df.index, origin='15-02-2019', unit='d')
df['date from counter1']= pd.to_datetime(df['counter'], origin='15-02-2019', unit='d')
print(df.head())
    a  counter date from index date from counter date from index1  \
0  10        0      2019-02-15        2019-02-15       2019-02-15   
1  11        1      2019-02-16        2019-02-16       2019-02-16   
2  12        2      2019-02-17        2019-02-17       2019-02-17   
3  13        3      2019-02-18        2019-02-18       2019-02-18   
4  14        4      2019-02-19        2019-02-19       2019-02-19   

  date from counter1  
0         2019-02-15  
1         2019-02-16  
2         2019-02-17  
3         2019-02-18  
4         2019-02-19  

Upvotes: 3

Related Questions