Reputation: 37
I am trying to add random number of days to a series of datetime values without iterating each row of the dataframe as it is taking a lot of time(i have a large dataframe). I went through datetime's timedelta, pandas DateOffset,etc but they they do not have option to give the random number of days at once i.e. using list as an input(we have to give random numbers one by one).
code: df['date_columnA'] = df['date_columnB'] + datetime.timedelta(days = n)
Above code will add same number of days i.e. n to all the rows whereas i want random numbers to be added.
Upvotes: 3
Views: 940
Reputation: 862691
If performance is important create all random timedeltas by to_timedelta
with numpy.random.randint
and add to column:
np.random.seed(2020)
df = pd.DataFrame({'date_columnB': pd.date_range('2015-01-01', periods=20)})
td = pd.to_timedelta(np.random.randint(1,100, size=len(df)), unit='d')
df['date_columnA'] = df['date_columnB'] + td
print (df)
date_columnB date_columnA
0 2015-01-01 2015-04-08
1 2015-01-02 2015-01-11
2 2015-01-03 2015-03-12
3 2015-01-04 2015-03-13
4 2015-01-05 2015-04-07
5 2015-01-06 2015-01-10
6 2015-01-07 2015-03-20
7 2015-01-08 2015-03-06
8 2015-01-09 2015-02-08
9 2015-01-10 2015-02-28
10 2015-01-11 2015-02-13
11 2015-01-12 2015-02-06
12 2015-01-13 2015-03-29
13 2015-01-14 2015-01-24
14 2015-01-15 2015-03-08
15 2015-01-16 2015-01-28
16 2015-01-17 2015-03-14
17 2015-01-18 2015-03-22
18 2015-01-19 2015-03-28
19 2015-01-20 2015-03-31
Performance for 10k rows:
np.random.seed(2020)
df = pd.DataFrame({'date_columnB': pd.date_range('2015-01-01', periods=10000)})
In [357]: %timeit df['date_columnA'] = df['date_columnB'].apply(lambda x:x+timedelta(days=random.randint(0,100)))
158 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [358]: %timeit df['date_columnA1'] = df['date_columnB'] + pd.to_timedelta(np.random.randint(1,100, size=len(df)), unit='d')
1.53 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 4
Reputation: 1186
import numpy as np
import pandas as pd
df['date_columnA'] = df['date_columnB'] +np.random.choice(pd.date_range('2000-01-01', '2020-01-01' , len(df))
Upvotes: 0
Reputation: 745
import random
df['date_columnA'] = df['date_columnB'].apply(lambda x:x+timedelta(days=random.randint(0,100))
Upvotes: 3