Reputation: 31
I got this code below which works fine but takes too much time to execute (>30s):
df['Date_engagement'] = np.nan
for i in range(df.shape[0]):
if not(pd.isna(df.loc[i,'Engagement'])):
df.loc[i, 'Date_engagement'] = np.busday_offset(
df.loc[i, 'Date_réception'].date(),
df.loc[i,'Engagement'], roll='backward', busdaycal=france_bdc
)
I guess this is because I loop through the Pandas dataframe, and I wonder if there is a way to achieve the same result, not using any loop.
Basically what this code does is creating the 'Date_engagement' column which will be the result of adding 'Engagement' days to 'Date_réception' if *'Engagement' *is not empty. I use *'np.busday_offset()' *to avoid holidays.
Thanks in advance for your help...
I tried to use Pandas features for dataframes (like DataFrame.apply()), but can't find how to do it correctly to get the same result as explained.
Upvotes: 1
Views: 147
Reputation: 37902
Not the most efficient one but you can still try it :
dt_arr = np.busday_offset(
df["Date_réception"].to_numpy().astype("datetime64[D]"),
df["Engagement"].fillna(0), roll="backward", busdaycal=france_bdc)
df["Date_engagement"] = pd.Series(dt_arr).where(df["Engagement"].notna())
#620 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Example :
np.random.seed(1)
df = pd.DataFrame({
"Date_réception": pd.date_range("20230401", periods=30, freq="W"),
"Engagement": np.random.choice(np.append(np.random.randint(5, 26, 25), [np.nan]),
size=30, replace=True)}
)
france_bdc = np.busdaycalendar(
holidays=["2023-01-01", "2023-04-17", "2023-05-01",
"2023-05-08", "2023-07-14", "2023-08-15",
"2023-11-01", "2023-11-11", "2023-12-25"]
)
Output :
print(df)
Date_réception Engagement Date_engagement
0 2023-04-02 23.0 2023-05-05
1 2023-04-09 23.0 2023-05-15
2 2023-04-16 21.0 2023-05-18
3 2023-04-23 10.0 2023-05-09
4 2023-04-30 23.0 2023-06-02
5 2023-05-07 10.0 2023-05-22
6 2023-05-14 19.0 2023-06-08
7 2023-05-21 18.0 2023-06-14
8 2023-05-28 21.0 2023-06-26
9 2023-06-04 21.0 2023-07-03
10 2023-06-11 20.0 2023-07-07
11 2023-06-18 19.0 2023-07-13
12 2023-06-25 NaN NaT
13 2023-07-02 16.0 2023-07-25
14 2023-07-09 10.0 2023-07-24
15 2023-07-16 10.0 2023-07-28
16 2023-07-23 5.0 2023-07-28
17 2023-07-30 9.0 2023-08-10
18 2023-08-06 18.0 2023-08-31
19 2023-08-13 25.0 2023-09-18
20 2023-08-20 23.0 2023-09-20
21 2023-08-27 6.0 2023-09-04
22 2023-09-03 NaN NaT
23 2023-09-10 5.0 2023-09-15
24 2023-09-17 20.0 2023-10-13
25 2023-09-24 13.0 2023-10-11
26 2023-10-01 10.0 2023-10-13
27 2023-10-08 15.0 2023-10-27
28 2023-10-15 10.0 2023-10-27
29 2023-10-22 NaN NaT
Upvotes: 0
Reputation: 120559
Suppose the input dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date_réception': ['2023-05-12', '2023-05-13'],
'Engagement': [10, np.nan]})
print(df)
# Output
Date_réception Engagement
0 2023-05-12 10.0
1 2023-05-13 NaN
You can vectorize your code:
france_bdc = np.busdaycalendar(weekmask='1111100', holidays=['2023-05-18'])
m = df['Engagement'].notna()
df.loc[m, 'Date_engagement'] = (
np.busday_offset(df.loc[m, 'Date_réception'].values.astype('datetime64[D]'),
df.loc[m, 'Engagement'], roll='backward', busdaycal=france_bdc)
)
Output:
>>> df
Date_réception Engagement Date_engagement
0 2023-05-12 10.0 2023-05-29
1 2023-05-13 NaN NaT
Upvotes: 3
Reputation: 1059
Maybe this will work:
mask = df['Engagement'].notnull()
df.loc[mask, 'Date_engagement'] = np.busday_offset(
df.loc[mask, 'Date_réception'].dt.date,
df.loc[mask, 'Engagement'],
roll='backward',
busdaycal=france_bdc
)
Upvotes: 0