Gorakhnath nigam
Gorakhnath nigam

Reputation: 37

Adding random number of days to a series of datetime values

I am trying to add random number of days to a series of datetime values without iterating each row of the dataframe as it is taking a lot of time(i have a large dataframe). I went through datetime's timedelta, pandas DateOffset,etc but they they do not have option to give the random number of days at once i.e. using list as an input(we have to give random numbers one by one).

code: df['date_columnA'] = df['date_columnB'] + datetime.timedelta(days = n)

Above code will add same number of days i.e. n to all the rows whereas i want random numbers to be added.

Upvotes: 3

Views: 940

Answers (3)

jezrael
jezrael

Reputation: 862691

If performance is important create all random timedeltas by to_timedelta with numpy.random.randint and add to column:

np.random.seed(2020)

df = pd.DataFrame({'date_columnB': pd.date_range('2015-01-01', periods=20)})

td = pd.to_timedelta(np.random.randint(1,100, size=len(df)), unit='d')
df['date_columnA'] = df['date_columnB'] + td
print (df)
   date_columnB date_columnA
0    2015-01-01   2015-04-08
1    2015-01-02   2015-01-11
2    2015-01-03   2015-03-12
3    2015-01-04   2015-03-13
4    2015-01-05   2015-04-07
5    2015-01-06   2015-01-10
6    2015-01-07   2015-03-20
7    2015-01-08   2015-03-06
8    2015-01-09   2015-02-08
9    2015-01-10   2015-02-28
10   2015-01-11   2015-02-13
11   2015-01-12   2015-02-06
12   2015-01-13   2015-03-29
13   2015-01-14   2015-01-24
14   2015-01-15   2015-03-08
15   2015-01-16   2015-01-28
16   2015-01-17   2015-03-14
17   2015-01-18   2015-03-22
18   2015-01-19   2015-03-28
19   2015-01-20   2015-03-31

Performance for 10k rows:

np.random.seed(2020)

df = pd.DataFrame({'date_columnB': pd.date_range('2015-01-01', periods=10000)})

In [357]: %timeit df['date_columnA'] = df['date_columnB'].apply(lambda x:x+timedelta(days=random.randint(0,100)))
158 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [358]: %timeit df['date_columnA1'] = df['date_columnB'] + pd.to_timedelta(np.random.randint(1,100, size=len(df)), unit='d')
1.53 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 4

PASUMPON V N
PASUMPON V N

Reputation: 1186

import numpy as np
import pandas as pd

df['date_columnA'] = df['date_columnB'] +np.random.choice(pd.date_range('2000-01-01', '2020-01-01' , len(df)) 

Upvotes: 0

Amir.S
Amir.S

Reputation: 745

import random

df['date_columnA'] = df['date_columnB'].apply(lambda x:x+timedelta(days=random.randint(0,100))

Upvotes: 3

Related Questions