ojp
ojp

Reputation: 1033

How to vectorize an expensive for loop in python

I have a pandas column which I have initialized with ones, this column represents the health of a solar panel.

I need to decay this value linearly unless the time has occurred where the panel will be replaced, here the value resets to 1 (hence why I have initialized to ones). What I am doing is looping through the column, then updating the current value with the value of the previous value, minus a constant.

This operation is extremely expensive (and I have over 200,000 samples). I was hoping someone might be able to help me with a vectorized solution, where I can avoid this for loop. Here is my code:

def set_degredation_factors_pv(df):
  for i in df.index:
    if i != replacement_duration_PV_year * hour_per_year and i != 0:
      df.loc[i, 'degradation_factor_PV_power_frac'] = df.loc[i-1, 'degradation_factor_PV_power_frac'] - degradation_rate_PV_power_perc_per_hour/100
  return df

Variables:

replacement_duration_PV_year = 25
hour_per_year = 8760
degradation_rate_PV_power_perc_per_hour = 5.479e-5

Input data:

time_datetime   degradation_factor_PV_power_frac
0   2022-01-01 00:00:00 1
1   2022-01-01 01:00:00 1
2   2022-01-01 02:00:00 1
3   2022-01-01 03:00:00 1
4   2022-01-01 04:00:00 1
... ... ...
8732    2022-12-30 20:00:00 1
8733    2022-12-30 21:00:00 1
8734    2022-12-30 22:00:00 1
8735    2022-12-30 23:00:00 1
8736    2022-12-31 00:00:00 1

Output data (only taking one year for time):

time_datetime   degradation_factor_PV_power_frac
0   2022-01-01 00:00:00 1.000000
1   2022-01-01 01:00:00 0.999999
2   2022-01-01 02:00:00 0.999999
3   2022-01-01 03:00:00 0.999998
4   2022-01-01 04:00:00 0.999998
... ... ...
8732    2022-12-30 20:00:00 0.995216
8733    2022-12-30 21:00:00 0.995215
8734    2022-12-30 22:00:00 0.995215
8735    2022-12-30 23:00:00 0.995214
8736    2022-12-31 00:00:00 0.995214

Upvotes: 1

Views: 55

Answers (1)

Corralien
Corralien

Reputation: 120489

Try:

rate = degradation_rate_PV_power_perc_per_hour / 100

mask = ~((df.index != replacement_duration_PV_year * hour_per_year)
         & (df.index != 0))

df['degradation_factor_PV_power_frac'] = (
    df.groupby(mask.cumsum())['degradation_factor_PV_power_frac']
      .apply(lambda x: x.shift().sub(rate).cumprod())
      .fillna(df['degradation_factor_PV_power_frac'])
)

Output:

>>> df
         time_datetime  degradation_factor_PV_power_frac
0  2022-01-01 00:00:00                          1.000000
1  2022-01-01 01:00:00                          0.999999
2  2022-01-01 02:00:00                          0.999999
3  2022-01-01 03:00:00                          0.999998
4  2022-01-01 04:00:00                          0.999998

Upvotes: 1

Related Questions