leboma241
leboma241

Reputation: 31

Python : my code is slow, using for loop with Pandas dataframe

I got this code below which works fine but takes too much time to execute (>30s):

df['Date_engagement'] = np.nan
for i in range(df.shape[0]):
if not(pd.isna(df.loc[i,'Engagement'])):
    df.loc[i, 'Date_engagement'] = np.busday_offset(
        df.loc[i, 'Date_réception'].date(),
        df.loc[i,'Engagement'], roll='backward', busdaycal=france_bdc
    )

I guess this is because I loop through the Pandas dataframe, and I wonder if there is a way to achieve the same result, not using any loop.

Basically what this code does is creating the 'Date_engagement' column which will be the result of adding 'Engagement' days to 'Date_réception' if *'Engagement' *is not empty. I use *'np.busday_offset()' *to avoid holidays.

Thanks in advance for your help...

I tried to use Pandas features for dataframes (like DataFrame.apply()), but can't find how to do it correctly to get the same result as explained.

Upvotes: 1

Views: 147

Answers (3)

Timeless
Timeless

Reputation: 37902

Not the most efficient one but you can still try it :

dt_arr = np.busday_offset(
            df["Date_réception"].to_numpy().astype("datetime64[D]"),
            df["Engagement"].fillna(0), roll="backward", busdaycal=france_bdc)

df["Date_engagement"] = pd.Series(dt_arr).where(df["Engagement"].notna())

#620 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Example :

np.random.seed(1)

df = pd.DataFrame({
    "Date_réception": pd.date_range("20230401", periods=30, freq="W"),
    "Engagement": np.random.choice(np.append(np.random.randint(5, 26, 25), [np.nan]),
                                   size=30, replace=True)}
)

france_bdc = np.busdaycalendar(
                holidays=["2023-01-01", "2023-04-17", "2023-05-01",
                          "2023-05-08", "2023-07-14", "2023-08-15",
                          "2023-11-01", "2023-11-11", "2023-12-25"]
)

Output :

print(df)

   Date_réception  Engagement Date_engagement
0      2023-04-02        23.0      2023-05-05
1      2023-04-09        23.0      2023-05-15
2      2023-04-16        21.0      2023-05-18
3      2023-04-23        10.0      2023-05-09
4      2023-04-30        23.0      2023-06-02
5      2023-05-07        10.0      2023-05-22
6      2023-05-14        19.0      2023-06-08
7      2023-05-21        18.0      2023-06-14
8      2023-05-28        21.0      2023-06-26
9      2023-06-04        21.0      2023-07-03
10     2023-06-11        20.0      2023-07-07
11     2023-06-18        19.0      2023-07-13
12     2023-06-25         NaN             NaT
13     2023-07-02        16.0      2023-07-25
14     2023-07-09        10.0      2023-07-24
15     2023-07-16        10.0      2023-07-28
16     2023-07-23         5.0      2023-07-28
17     2023-07-30         9.0      2023-08-10
18     2023-08-06        18.0      2023-08-31
19     2023-08-13        25.0      2023-09-18
20     2023-08-20        23.0      2023-09-20
21     2023-08-27         6.0      2023-09-04
22     2023-09-03         NaN             NaT
23     2023-09-10         5.0      2023-09-15
24     2023-09-17        20.0      2023-10-13
25     2023-09-24        13.0      2023-10-11
26     2023-10-01        10.0      2023-10-13
27     2023-10-08        15.0      2023-10-27
28     2023-10-15        10.0      2023-10-27
29     2023-10-22         NaN             NaT

Upvotes: 0

Corralien
Corralien

Reputation: 120559

Suppose the input dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Date_réception': ['2023-05-12', '2023-05-13'],
                   'Engagement': [10, np.nan]})
print(df)

# Output
  Date_réception  Engagement
0     2023-05-12        10.0
1     2023-05-13         NaN

You can vectorize your code:

france_bdc = np.busdaycalendar(weekmask='1111100', holidays=['2023-05-18'])
m = df['Engagement'].notna()

df.loc[m, 'Date_engagement'] = (
    np.busday_offset(df.loc[m, 'Date_réception'].values.astype('datetime64[D]'),
                     df.loc[m, 'Engagement'], roll='backward', busdaycal=france_bdc)
)

Output:

>>> df
  Date_réception  Engagement Date_engagement
0     2023-05-12        10.0      2023-05-29
1     2023-05-13         NaN             NaT

Upvotes: 3

Gedas Miksenas
Gedas Miksenas

Reputation: 1059

Maybe this will work:

mask = df['Engagement'].notnull()

df.loc[mask, 'Date_engagement'] = np.busday_offset(
    df.loc[mask, 'Date_réception'].dt.date,
    df.loc[mask, 'Engagement'],
    roll='backward',
    busdaycal=france_bdc
)

Upvotes: 0

Related Questions